HansHalverson 2 days ago

Author here - very cool to see this get posted! Thank you @ivankra for adding this to https://github.com/ivankra/javascript-zoo and running those benchmarks, I really appreciate it!

This started as a hobby project that I've ended up putting a lot of time into over the last three years chasing completeness and performance.

bcardarella 2 days ago

Just a small comparison, compiled for release:

Boa: 23M Brimstone: 6.3M

I don't know if closing the gap on features with Boa and hardening for production use will also bloat the compilation size. Regardless, for passing 97% of the spec at this size is pretty impressive.

  • jerf 2 days ago

    It looks like Boa has Unicode tables compiled inside of itself: https://github.com/boa-dev/boa/tree/main/core/icu_provider

    Brimstone does not appear to.

    That covers the vast bulk of the difference. The ICU data is about 10.7MB in the source (boa/core/icu_provider) and may grow or shrink by some amount in the compiling.

    I'm not saying it's all the difference, just the bulk.

    There's a few reasons why svelte little executables with small library backings aren't possible anymore, and it isn't just ambient undefined "bloat". Unicode is a big one. Correct handling of unicode involves megabytes of tables and data that have to live somewhere, whether it's a linked library, compiled in, tables on disks, whatever. If a program touches text and it needs to handle it correctly rather than just passing it through, there's a minimum size for that now.

    • HansHalverson 2 days ago

      Brimstone does embed Unicode tables, but a smaller set than Boa embeds: https://github.com/Hans-Halverson/brimstone/tree/master/icu.

      Brimstone does try to use the minimal set of Unicode data needed for the language itself. But I imagine much of the difference with Boa is because of Boa's support for the ECMA-402 Internationalization API (https://tc39.es/ecma402/).

      • nekevss 2 days ago

        Yeah, the majority of the difference is from the Unicode data for Intl along with probably the timezone data for Temporal.

        • nicoburns 2 days ago

          Is it possible to build Boa without these APIs?

          • nekevss 2 days ago

            For the engine, the answer is yes, Intl and Temporal are feature flagged due to the dependencies. What I suspect they’re comparing above is the CLIs, which is completely different than the engine. I’d have to double check for the CLI. If I recall correctly, we include all features in the CLI by default.

    • ambicapter 2 days ago

      Unicode is everywhere though. You'd think there'd be much greater availability of those tables and data and that people wouldn't need to bundle it in their executables.

      • nicoburns 2 days ago

        Unfortunately operating systems don't make the raw unicode data available (they only offer APIs to query it in various ways). Until they do we all have to ship it seperately.

        • lifthrasiir a day ago

          For some OSes like Windows, some relevant APIs can be indeed used to reconstruct those tables. I found that this is in fact viable for character encoding tables, only requiring a small table for fixes in most cases.

        • pabs3 2 days ago

          Debian has a unicode-data package, so you can just depend on it.

    • rixed 2 days ago

      I was currious to see what that data consisted of and aparently that's a lot of translations, like the name of all possible calendar formats in all possible languages, etc. This seems useless in the vast majority of use cases, including that of a JS interpreter. Looks to me like the typical output of a comitee that's looking too hard to extend its domain.

      Disclaimer: I never liked unicode specs.

      • necovek 2 days ago

        Unicode is an attempt to encode the world's languages: there is not much to like or dislike about it, it only represents the reality. Sure, it has a number of weird details, butnif anything, it's due to the desire to simplify it (like Han unification or normal forms).

        Any language runtime wanting to provide date/time and string parsing functions needs access to the Unicode database (or something of comparable complexity and size).

        Saying "I don't like Unicode" is like saying "I don't like the linguistic diversity in the world": I mean sure, OK, but it's still there and it exists.

        Though note that date-time, currency, number, street etc. formatting is not "Unicode" even if provided by ICU: this is similarly defined by POSIX as "locales", anf GNU libc probably has the richest collection of locales outside of ICU.

        There are also many non-Unicode collation tables (think phonebook ordering that's different for each country and language): so no good sort() without those either.

        • rixed a day ago

          I am not questionning the goal of representing all the fine details of every possible languages and currencies and calendars in use anywhere at any time in the universe, that's a respectable achievment. I'm discussing the process that lead to a programming language interpreter needing, according to the comment I was replying to, to embed that trove of data.

          Most of us are not using computers to represent subtle variants of those cultural artifacts and therefore they should be left in some specialized libraries.

          Computers are symbolic machines, after all, and many times we would be as good using only 16 symbols and typing our code on a keyboard with just that many keys. We can't have anything but 64bits floats in JS, but somehow we absolutely need to be able to tell between the "peso lourd argentin (1970–1983)" and the "peso argentin (1881–1970)"? And that to display a chemical concentration in millimole per liter in German one has to write "mmol/l"?

          I get it, the symbolic machines need to communicate with humans, who use natural languages written in all kind of ways, so it's very nice to have a good way to output and input text. We wanted that way to not favor any particular culture and I can understand that. But how do you get from there to the amount of arcane specialized minute details in the ICU dataset is questionable.

        • animuchan a day ago

          > Saying "I don't like Unicode" is like saying "I don't like the linguistic diversity in the world": I mean sure, OK, but it's still there and it exists.

          Respectfully disagree, linguistic diversity isn't by definition impossible to create a good abstraction on top of; I think that it's more of a failure of this particular attempt.

          • necovek a day ago

            Care to point out a -- by your definition -- successful attempt to do it?

        • xeonmc 2 days ago

          Does that include emojis?

          • jcranmer 2 days ago

            Emojis are complicated from a font rendering perspective. But from a string processing perspective, they're generally going to be among the simplest characters: they don't have a lot of complex properties with a lot of variation between individual characters. Compare something like the basic Latin characters, where the mappings for precomposed characters are going to vary wildly from 'a' to 'b' to 'c', etc., whereas the list of precomposed characters for the emoji blocks amounts to "none."

            • necovek 2 days ago

              Agreed!

              FWIW, they are not even "complicated" from a font rendering perspective: they're simple non-combining characters and they are probably never used in ligatures either (though nothing really stops you; just like you can have locale-specific variants with locl tables). It's basically "draw whatever is in a font at this codepoint".

              Yes, if you want to call them out based on Unicode names, you need to have them in the database, and there are many of them, so a font needs to have them all, but really, they are the simplest of characters Unicode could have.

              • overfeed 2 days ago

                > they're simple non-combining characters

                Skin-tone emoji's are combined characters: base emoji + tone.

                • necovek a day ago

                  TIL, thanks for pointing it out.

              • SkiFire13 a day ago

                To add to the skintone emojis example, country flags emojis are combined characters using two letter characters corresponding to the country code. The various "family" emojis are also combined characters of individual person emojis, and so on.

              • nicoburns 2 days ago

                "draw whatever is in a font at this codepoint" is doing quite a lot of work there. Some emoji fonts just embed a PNG which is easy. But COLRv1 fonts define an entire vector graphics imaging model which is similar what you need to render an SVG.

                • kibwen 2 days ago

                  Yes, but at this point we're completely outside the scope of Unicode, which has nothing to do with how anything actually gets drawn to the screen.

        • gishh 2 days ago

          [flagged]

    • miki123211 a day ago

      I just wish we could use system tables for that, instead of bloating every executable with their own outdated copy.

      I have no issue with my system using an extra 10mb for Ancient Egyptian capitalization to work correctly. Every single program including those rules is a lot more wasteful.

    • jancsika 2 days ago

      If someone builds, say, a Korean website and needs sort(), does the ICU monolith handle 100% of the common cases?

      (Or substitute for Korean the language that has the largest amount of "stuff" in the ICU monolith.)

      • adzm 2 days ago

        Yes, though it's easy to not use the ICU library properly or run into issues wrt normalization etc

    • twoodfin 2 days ago

      As well-defined as Unicode is, surprising that no one has tried to replace ICU with a better mousetrap.

      Not to say ICU isn’t a nice bit of engineering. The table builds in particular I recall having some great hacks.

      • necovek 2 days ago

        POSIX systems actually have their own approach with "locales" and I it predates Unicode and ICU.

        Unfortunately, for a long time, POSIX system were uncommon on desktops, and most Unices do not provide a clean way to extend it from userland (though I believe GNU libc does).

  • embedding-shape 2 days ago

    Is that with any other size optimizations? I think by default, most of them (like codegen-units=1, remove panic handling, etc) are tuned for performance, not binary size, so might want to look into if the results are different if you change them.

    • LtdJorge 2 days ago

      Stripping can save a huge amount of binary size, there’s lots of formatting code added for println! and family, stacktrace printing, etc. However, you lose those niceties if stripping at that level.

    • bcardarella 2 days ago

      I only ran both with `cargo build --release`

  • martin-t 2 days ago

    I was gonna say the last few percent might increase the size disproportionally as the last percent tend to do[0] but looks like boa passes fewer tests (~91%).

    This is something I notice in small few-person or one-person projects. They don't have the resources to build complex architectures so the code ends up smaller, cleaner and easier to maintain.

    The other way to look at it is that cooperation has an overhead.

    [0]: The famous 80:20 rule. Or another claiming that each additional 9 in reliability (and presumably other aspects) takes the same amount of work.

maxloh 2 days ago

Could you compare it with Boa? It is written in Rust too.

https://github.com/boa-dev/boa

  • ivankra 2 days ago

    I have some benchmark results here: https://ivankra.github.io/javascript-zoo/?v8=true

    It's impressively compliant, considering it's just a one man project! Almost as fully featured as Boa, plus or minus a few things. And generally faster too, almost double the speed of Boa on some benchmarks.

    • lucideer 2 days ago

      First time seeing a few of the engines listed here - based on this table I'm surprised Samsung's Escargot hasn't gotten more attention. LGPL, 100% ES2016+ compliance, top 10 perf ranking, 25% the size of V8 & only 318 Github stars.

      A quick HN search shows 0 comments for Escargot - is there some hidden problem with this engine not covered in this table?

      • evilduck 2 days ago

        Because it pretty much only makes sense for Samsung TVs and smart appliances since it scores 3% on the benchmarks vs V8.

        It's too big for most embedded devices, too slow for general computing, and if you can run something 25% the size of V8, you can probably just run V8. If for some reason that size and speed profile does fit your niche and you aren't Samsung wanting to use their own software, then Facebook's Hermes looks better in terms of licensing, speed and binary size and writing compatible JS for it isn't that hard.

        • ivankra 2 days ago

          Escargot is more spec-compliant than Hermes though.

          Sometimes you can't run V8 or any JIT engine because policy or politics, and it's nice to have options.

          • evilduck a day ago

            I get it, I'm not saying it's bad, I just think it fills a small niche and its popularity and reception mirror that.

    • mrec 2 days ago

      Surprised at the lack of a license though.

    • nicoburns 2 days ago

      Interesting. Hermes and QuickJS both come out looking very good in these (in terms of performance vs. binary size)

phplovesong 2 days ago

Why is stuff written in rust always promoted as "written in rust" like its some magic thing?

  • ssrc 2 days ago

    I'm old enough to have seen the "written in lisp", "written in ruby", "written in javascript" eras, among others. It's natural.

    • phplovesong a day ago

      Im also old enough to recall when ruby (rails really) was the hype everyone jumped on. But i cant say i ever seen "rewritten in javascript/X" thing as its used with rust.

      In rust i see that the rewrite is the important thing, not the software itself. Its completely bonkers.

  • BiteCode_dev 2 days ago

    Because many people, including myself, have been consistently experiencing better quality from Rust-written software.

    Maybe it's the type of language that attracts people who are interested in getting the details right.

    Or maybe the qualities of the language mean if a project manages to reachthe production stage, it will be better than an alternative that would reach the production stage because the minimal level of quality and checks required are better.

    Or maybe it's because it comes with very little friction to install and use the software, because Rust software usually comes with a bunch of binaries from all popular platforms, and often, installers.

    Or maybe the ecosystem is just very good.

    Or maybe it's all together, and something more.

    Doesn't matter.

    The fact is, I did have a better experience with software written in rust that in Python, JS or even Go or Java.

    And I appreciate knowing the software is not written in C or C++, and potentially contains problems regarding security, segfaults, and encoding that are going to bite me down the road, as it's been common in the last 30 years.

    So "written in rust" is a thing I want to know, as it will make me more likely to try said software.

  • enricozb 2 days ago

    It carries some weight, very roughly in the direction of formal verification. Since (assuming there isn't any unsafe), a specific class of bugs are guaranteed to not happen.

    However, this repo seems like it uses quite a bit of unsafe, by their own admission.

    • ptravers 2 days ago

      There's a lot of unsafe in this at least. hard to be both safe and fast.

      • shepherdjerred 2 days ago

        From the README:

        > Compacting garbage collector, written in very unsafe Rust

    • phplovesong 2 days ago

      I mean if i care about safety that much i would just write the damn thing in ATS. Rust has too many escape hatches to be safe anyway.

      • Pfeil 2 days ago

        You never have only one requirement to satisfy. For example, if you'd welcome a certain amount of contributors, your language should be something people know or people like to learn. And of course it may just be the mood of the initiator, which I find completely fine.

        Personally I find rust projects very inviting. Figuring out the amount of unsafe code is easy with grep/rg (to a certain degree), the project structure is pretty standardized, etc. All of this makes even a complex project relatively easy to start with. At the same time, the language is pretty usual (C-like and readable). I understand people like it, and writing "written in rust" is a good call for those people, I guess.

        "Written in JS" would communicate something else than "written in D" or "written in C++". It communicates a lot of things implicitly.

  • kettlecorn 2 days ago

    One simple reason: for those of us invested in the Rust ecosystem it helps us spot new projects we could consider using.

  • westoncb 2 days ago

    I think the idea is like: it took extra work 'cause Rust makes you be so explicit about allocations and types, but it's also probably faster/more reliable because that work was done.

    Of course at the end of the day it's just marketing and doesn't necessarily mean anything. In my experience the average piece of Rust software does seem to be of higher quality though..

    • echelon 2 days ago

      Even forgetting the memory safety and async safety guarantees, the language design produces lower defect code by a wide margin. Google and other orgs have written papers about this.

      There are no exceptions. There are no nulls. You're encouraged to return explicit errors. No weird error flags or booleans or unexpected ways of handling abnormal behaviors. It's all standardized. Then the language syntax makes it easy to handle and super ergonomic and pleasurable. It's nice to handle errors in Rust. Fully first class.

      Result<T,E>, Option<T>, match, if let, if let Ok, if let Some, while let, `?`, map, map_err, ok_or, ok_or_else, etc. etc. It's all super ergonomic. The language makes this one of its chief concerns, and writing idiomatic Rust encourages you to handle errors smartly.

      Because errors were so well thought out, you write fewer bugs.

      Finally, the way the language makes you manage scope, it's almost impossible to write complicated nesting or difficult to follow logic. Hard to describe this one unless you have experience writing Rust, but it's a big contributor to high quality code.

      Rust code is highly readable and easy to reason about (once you learn the syntax). There are no surprises with Rust. It's written simply and straightforwardly and does what it says on the tin.

      • phplovesong 2 days ago

        Thats not special to rust in any way or form. Most of mentioned features are stolen from ML, and in some cases badly. Eg rust has unwrap thats basically a ticking time bomb waiting to blow up. Rust has many other ways to blow up the program. Its not only about memory safety (80% of rust apps in the wild dont benefit from "memory safety" in any way or form).

        • nicoburns 2 days ago

          > Most of mentioned features are stolen from ML

          Rust is the most popular ML-derived language, so if some is considering Rust vs. some other language the chances are the other one they're considering does not have all the ML goodies.

        • Tadpole9181 2 days ago

          Okay, but the alternative isn't ML; virtually all of this software would otherwise be written in C or C++.

        • tcfhgj 2 days ago

          IMHO every app benefits from memory safety.

          Memory safety doesn't only have security implications, but reduces crashes, misbehavior and corrupt data.

          You don't want either in any software, which has to fulfill a task in a productive way.

          • phplovesong a day ago

            Sure, but a GC gives you that. Having manual memory management (like in C) is something only a very few applications REALLY need. Hell i have seen web frontends written in rust. Now the circle is complete.

            • tcfhgj a day ago

              1. It doesn't give you that necessarily, see Go.

              2. Rust doesn't have memory management like in C. In Rust, abstractions and the compiler manage memory for you, except when you opt into C-like memory management using unsafe.

              3. The comment was about memory safety, not memory management, and its benefit.

              4. In case of GC vs manual memory management was used as a speed comparison: You might not REALLY need the speed of Rust, but I gladly take it where I can. I am tired of sluggish resource hogging electron apps and similar. Electron probably destroyed at least 10-15 years of progress in hardware performance gains.

      • shuraman7 2 days ago

        are you for real? Rust most definitely is not highly readable and the language reeks of complexity

        • danielheath 2 days ago

          Readable is relative; it’s much less readable than some languages, but much more than the ones it’s largely displacing.

          I would much rather try and figure out a bug in unfamiliar rust than in unfamiliar cpp.

        • echelon 2 days ago

          It's comparable with Java.

          It's significantly easier to parse than C++.

          • hu3 2 days ago

            Java doesn't have lifetimes. So no.

            And C++ can only be beaten by the likes of K, J and brainfuck. Very low bar to clear.

  • the__alchemist 2 days ago

    In the case of libraries, this distinction is important; we've set up our computing infrastructure in a way replete with barriers which and drive repeated efforts in isolation. Therefore, if a library is written in Rust, it suggests that I can use it in my Rust program clear of a conspicuous barrier type.

    For an application, service, etc like this... it is not relevant.

  • rhubarbtree a day ago

    It’s a very Blub. Rust seems ok as a language, but some young uns moving from say JS/TS think it’s amazing because they haven’t used anything else.

  • general1465 2 days ago

    It is becoming meme like Arch - It is written in Rust btw.

  • 65 2 days ago

    Usually people think Rust = fast, so "Written in Rust" might imply it runs fast.

  • p1necone 2 days ago

    The people that complain about projects saying they were written in rust are far more annoying than the projects themselves at this point.

  • almost 2 days ago

    When it's a library it's fairly important what it's written (or at least written for)

  • dainiusse 2 days ago

    Yes. Getting really odd...

  • noitpmeder 2 days ago

    [flagged]

    • cyanydeez 2 days ago

      The virtue of being memory safe is probably more a value than a virtue.

      Also, without associated social virtue signaling, what do you think is wrong with signaling?

boianmihailov 2 days ago

"Compacting garbage collector, written in very unsafe Rust" got me cracking.

  • varispeed 2 days ago

    Sorry for the offtop, but I really miss the cracktros. Imagine having Ikari intro before you boot into your OS.

    • vardump 2 days ago

      Sorry also for being offtopic, but "cracking" in this case most likely refers to cracking [with laughter].

      • varispeed 2 days ago

        "Don't. Say. Crack, Jez. Yeah? Please. Not now. Cause you saying crack makes me think about crack and I love crack. So can you not say crack?"

    • BoredPositron 2 days ago

      Plymouth let's you do it on Linux without hacking around like osx or windows.

383toast 2 days ago

how does this compare to existing JS engines?

  • echelon 2 days ago

    You can embed this one in your Rust programs. No linking to C/C++. All native Rust.

    That little 40 mb single binary server you wrote can now be scripted in JavaScript.

    This is frankly awesome, and now there are multiple Rust-native JavaScript engines. And they both look super ergonomic.

cluckindan 2 days ago

Memory safety is one of Rust’s biggest selling points. It’s a bit baffling that this engine would choose to implement unsafe garbage collection.

  • scratcheee 2 days ago

    The obvious use-case for unsafe is to implement alternative memory regimes that don’t exist in rust already, so you can write safe abstractions over them.

    Rust doesn’t have the kind of high performance garbage collection you’d want for this, so starting with unsafe makes perfect sense to me. Hopefully they keep the unsafe layer small to minimise mistakes, but it seems reasonable to me.

  • nicoburns 2 days ago

    Even using something as simple as Vec means using `unsafe` code (from the std library). The idea isn't to have no `unsafe` code (which is impossible). It's to limit it to small sections of your code that are much more easily verifiable.

    For some use cases, that means that "user code" can have no `unsafe`. But implementing a GC is very much not one of those.

  • jeroenhd 2 days ago

    Rust also has some nice language features. Even unsafe rust doesn't have the huge "undefined behaviour" surface that languages like C++ still contain.

    If I were to write a toy JS runtime in Rust, I'd try to make it as safe as possible and deal with unsafe only when optimization starts to become necessary, but it's not like that's the only way to use Rust.

    • LtdJorge 2 days ago

      That’s the philosophy. Use the less constrained (but still somewhat constrained and borrow checked) unsafe to wrap/build the low level stuff, and expose a safe public API. That way you limit the exposure of human errors in unsafe code to a few key parts that can be well understood and tested.

  • swiftcoder 2 days ago

    The whole point of unsafe is to be able to circumvent the guardrails where the developer knows something the compiler isn't (yet) smart enough to understand. It's likely that implementing a high-performance GC runs afoul of quite a few of those edge cases.

  • the__alchemist 2 days ago

    IMO the memory safety aspect is overblown by enthusiasts and purists. Rust is an overall nice fast imperative language.

    • phplovesong 2 days ago

      Rust WAS really nice before it got mangled with syntax like we never seen before. Graydon did not imagine rust as what it is today. Rust core wo. async is ok, but in practice rust projects tend to have hundreds of deps and really slow compiles. Its just like javascript with npm.

      • swiftcoder 2 days ago

        > in practice rust projects tend to have hundreds of deps

        That's really just any language with a built-in package manager. Go somewhat sidesteps this by making you vendor your dependencies, but very few other languages escape the ballooning dependency graph.

        • phplovesong 2 days ago

          Go has probably more packages than Rust, but i rarely see Go projects that use as many as rust. In Go the usuals are depending on the app, possibly some db drivers, a simple router and maybe some crypto related thing. Most projects do fine with just the stdlib. In rust i tend to see 100 deps, and 1000 transient deps. Compile times are not in seconds, but minutes.

          • swiftcoder 2 days ago

            You can do that just fine in Rust too, for example the Makepad[1] developers take a pretty extreme non-invented-here stance, and builds a full UI toolkit with no external dependencies (and a major focus on clean-build compile times).

            However, it isn't really part of the Rust OSS culture to operate like that. The culture typically values safety over compile times, and prefers to lean on a deep stable of battle-hardened abstractions.

            [1]: https://makepad.nl, https://github.com/makepad/makepad

            • hu3 2 days ago

              It's still hundreds of functionalities that they have to maintain themselves instead of leveraging a much more battle tested stdlib.

      • the__alchemist 2 days ago

        This is a concern when viewing the Rust experience. It can be avoided by judiciously choosing dependencies that you need, and that have shallow trees of their own. I like to distinguish the rust lang from The rust OSS community. They overlap, but can be treated separately if you exercise case.

tetris11 2 days ago

There's no license I can see

  • HansHalverson 2 days ago

    That was an oversight, this is now under the MIT license!

  • bsnnkv 2 days ago

    Great to see more projects not opting into licenses which permit megacorp exploitation by default

    • robocat 2 days ago

      Many megacorps provide value to users. For example Google and Apple are used by maybe 75% of humanity. Google in particular appears to have given back into the ecosystem (often to Google's detriment). It isn't as binary as you make it to be.

fgallih 2 days ago

[flagged]

  • chiffaa 2 days ago

    > who in the fuck would write a garbage collector using garbage collected Rust?

    Rust is not garbage collected unless you explicitly opt into using Rc/Arc

    • gpm 2 days ago

      If you count Rc/Arc as garbage collection you should count RAII + The Borrow Checker (i.e. all safe rust) as garbage collection too IMHO. It collects garbage just as automatically - it just does so extremely efficiently.

      That said I tend to count neither. Garbage collection to me suggests you have something going around collecting it, not just you detect you're done with something when you're done with it and deal with it yourself.

    • oconnor663 2 days ago

      I still wouldn't call it GC in that case. It's pretty much exactly the same as std::shared_ptr in C++, and we don't usually call that GC. I don't know about the academic definition, but I draw the line at a cycle collector. (So e.g. Python is GC'd, but Rust/C++/Swift are not.)

      • cmrdporcupine 2 days ago

        I consider reference to be garbage collection, and so do most CS textbooks. However Rc/Arc/shared_ptr are GC facilities used (often sparingly) inside predominantly non-GC'd languages, so, yeah, I wouldn't say Rust "is" or "has" GC. It has facilities for coping with cleanup, both RAII and GC.

  • MattRix 2 days ago

    Rust is not garbage collected though.

    • vampirex 2 days ago

      Yes, but safe Rust enforces strict borrow checking with tracing, reference counting, etc. which would be inefficient for GC implementation.

larusso 2 days ago

Love the name of the executable ;) For my taste it just sounds right. BS as in Bullsh#t :)