[kepler-dev] Re: zero-length arrays
Bertram Ludaescher
ludaesch at sdsc.edu
Thu Aug 12 02:06:46 PDT 2004
Tobin:
Great observation with the one minute exec time for the "expensive
concat" vs the more efficient version.
I'm still surprised at the need to create a copy for C when doing
C := concat(A,B).
In Prolog, for example, and I think also in Lisp, when you do
C := concat(A,B)
(here syntax = abstract)
then you create a copy ONLY of the first of the two lists, the second
list is untouched. The reason that this is correct is because lists
are represented as linked lists. So you can reuse all of B, but not A
(since the last list pointer in A is NIL, and rewiring it to point to
B would invalidate all references to A; cf. how the unsafe "nconc"
works in Lisp vs the clean "append"). For A you have to make a fresh
copy and let its last pointer be a pointer to the start of B.
All this is declarative (no side effects), neat and halfway efficient
(only A is copied not B).
If the default implementation of arrays/lists in Java is consecutive
memory cells (as opposed to linked lists) then I'm not surprised at
the extra overhead...
Note (for experts only ;-) -- in Prolog there is a way to get
declarative (a la append, not a la destructive nconc) and very
efficient O(1) list append! It's using a "trick" called "difference
lists". I'm not sure whether Haskell et al have a similar O(1)
append...
Bertram
PS here is the trick for O(1) append:
http://cblpc0142.leeds.ac.uk/~paul/prologbook/node180.html
>>>>> "TF" == Tobin Fricke <tobin at splorg.org> writes:
TF>
TF> On Wed, 11 Aug 2004, Shawn Bowers wrote:
>> Isn't this last point just an implementation detail? Ultimately, a list
>> is the "abstract" data structure, and array is just a common programming
>> language implementation for a list-like structure. The more general
>> construct is really "collection" for which bags (mult-sets), sets, and
>> lists are specific types of collections.
TF>
TF> That's completely true. But the underlying implementation determines the
TF> efficiency of operations on the structure, and therefore the algorithms
TF> that you'll choose to use.
TF>
TF> Java strings are sort of "array-like" in that, to concatenate two strings,
TF> you have to make a new string of the desired length and copy the original
TF> strings into that. If you want to concatenate a large number of strings,
TF> it takes an incredibly long time, since all these intermediate copies are
TF> floating around. Try this:
TF>
TF> sum( map( function(x:int)(x.toString), [1:1:10000].toArray ) )
TF>
TF> It takes about a minute (!) to execute in the stock Ptolemy. I added
TF> special code to handle the case of sum({string}) by using a StringBuffer
TF> to accumulate the intermediate results (a "list-like" implementation), and
TF> the execution time was reduced to approximately one second.
TF>
TF> Having an 'ordered collection' data type where appending is expensive
TF> makes a lot of things difficult, notably just about any lisp idiom. But
TF> it's possible that the right avenue IS just to create a decent set of
TF> built-in primitives, so that these expensive implementations are not
TF> necessary, and therefore avoid language creep.
TF>
>> Also, it isn't clear to me that an empty record is a useful structure,
>> whereas I definately see how an empty array (or more specifically an empty
>> list) is useful.
TF>
TF> That's a good point. But I don't think we'd want to rule out empty
TF> records entirely... who knows when they might be useful? (Also, the empty
TF> record is a good type signature of "a record of some kind.")
TF>
TF> But it seems that Ptolemy allows empty records: intersect({foo=0},{bar=0}) !
TF>
TF> Also seems like a good reason to allow empty records.
TF>
TF> Tobin
TF> _______________________________________________
TF> kepler-dev mailing list
TF> kepler-dev at ecoinformatics.org
TF> http://www.ecoinformatics.org/mailman/listinfo/kepler-dev
More information about the Kepler-dev
mailing list