pickpuck 13 hours ago

What if we extended this idea beyond one dataset to all discrete news events and entities: people, organizations, places.

Just like here you could get a timeline of key events, a graph of connected entities, links to original documents.

Newsrooms might already do this internally idk.

This code might work as a foundation. I love that it's RDF.

  • VikingCoder 12 hours ago

    Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale

    Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus

  • jandrewrogers 11 hours ago

    This has been attempted many times. They all fail the same way.

    These general data models start to become useful and interesting at around a trillion edges, give or take an order of magnitude. A mature graph model would be at least a few orders of magnitude larger, even if you aggressively curated what went into it. This is a simple consequence of the cardinality of the different kinds of entities that are included in most useful models.

    No system described in open source can get anywhere close to even the base case of a trillion edges. They will suffer serious scaling and performance issues long before they get to that point. It is a famously non-trivial computer science problem and much of the serious R&D was not done in public historically.

    This is why you only see toy or narrowly focused graph data models instead of a giant graph of All The Things. It would be cool to have something like this but that entails some hardcore deep tech R&D.

    • michelpp 10 hours ago

      There are open source projects moving toward this scale, the GraphBLAS for example uses an algebraic formulation over compressed sparse matrix representations for graphs that is designed to be portable across many architectures, including cuda. It would be nice if companies like nivida could get more behind our efforts, as our main bottleneck is development hardware access.

      To plug my project, I've wrapped the SuiteSparse GraphBLAS library in a postgres extension [1] that fluidly blends algebraic graph theory with the relational model, the main flow is to use sql to structure complex queries for starting points, and then use the graphblas to flow through the graph to the endpoints, then joining back to tables to get the relevant metadata. On cheap hetzner hardware (amd epyc 64 core) we've achieved 7 billion edges per second BFS over the largest graphs in the suitesparse collection (~10B edges). With our cuda support we hope to push that kind of performance into graphs with trillions of edges.

      [1] https://github.com/OneSparse/OneSparse

    • babelfish 11 hours ago

      I don't have any experience on graph modeling, but it seems like Neo4j should be able to support 1 trillion edges, based on this (admittedly marketing) post of theirs? https://neo4j.com/press-releases/neo4j-scales-trillion-plus-...

      • jandrewrogers 10 hours ago

        The graph database market has a deserved reputation for carefully crafting scaling claims that are so narrowly qualified as to be inapplicable to anything real. If you aren't deep into the tech you'll likely miss it in the press releases. It is an industry-wide problem, I'm not trying to single out Neo4j here.

        Using this press release as an example, if you pay attention to the details you'll notice that this graph has an anomalously low degree. That is, the graph is very weakly connected, lots of nodes and barely any edges. Typical graph data models have much higher connectivity than this. For example, the classic Graph500 benchmark uses an average degree of 16 to measure scale-out performance.

        So why did they nerf the graph connectivity? One of the most fundamental challenges in scaling graphs is optimally cutting them into shards. Unlike most data models, no matter how you cut up the graph some edges will always span multiple shards, which becomes a nasty consistency problem in scale-out systems. Scaling this becomes exponentially harder the more highly connected the graph. So basically, they defined away the problem that makes graphs difficult to scale. They used a graph so weakly connected that they could kinda sorta make it work on a thousand(!) machines even though it is not representative of most real-world graph data models.

    • stevage 9 hours ago

      >These general data models start to become useful and interesting at around a trillion edges

      That is a wild claim. Perhaps for some very specific definition of "useful and interesting"? This dataset is already interesting (hard to say whether it's useful) at a much tinier scale.

      • jandrewrogers 8 hours ago

        It was a widely observed heuristic going back to the days when the Semantic Web was trendy. The underlying reason is also obvious once stated.

        Almost every non-trivial graph data model about the world is a graph of human relationships in the population. If not directly then by proxy. Population scale human relationship graphs commonly pencil out at roughly 1T edges, a function of the population size. It is also typically the highest cardinality entity. Even the purpose isn’t a human relationship graph, they all tend to have one tacitly embedded with the scale implied.

        If you restrict the set of human entities, you either end up with big holes in the graph or it is a graph that is not generally interesting (like one limited to company employees).

        The OP was talking about generalizing this to a graph of people, places, events, and organizations, which always has this property.

        It is similar to the phenomenon that a vast number of seemingly unrelated statistics are almost perfectly correlated with GDP.

      • zozbot234 9 hours ago

        This is not a "general purpose data model", though. A better example would be Wikidata which at about 100M nodes and 1B edges (so orders of magnitude less than that 1T claim) is already enabling plenty of useful queries about all sorts of publicly-available data and entities.

    • mmooss 7 hours ago

      > It is a famously non-trivial computer science problem and much of the serious R&D was not done in public historically.

      Could you point us to any public research on this issue? Or the history of the proprietary research? Just the names might help - maybe there are news articles, it's a section in someone's book, etc.

    • theteapot 9 hours ago

      > It would be cool to have something like this ..

      Aren't LLMs something like this?

      • djtango 9 hours ago

        An LLM probabilistically produces tokens over its model which is why it can hallucinate whilst an actual graph model would not have that issue

  • afavour 10 hours ago

    The New York Times has an API that lets you query “tags” or “topics” and the articles associated with them:

    https://developer.nytimes.com/docs/semantic-api-product/1/ov...

    The Guardian has similar:

    https://open-platform.theguardian.com/documentation/tag

    Either or both could be an interesting starting point for something like that. I tried to find something for the BBC and was surprised they didn’t have anything. I would have figured public media would have been a great resource for this.

  • ggm 10 hours ago

    Given 6 degrees is rooted in reality, this means we can draw causal graphs from anyone (bad) to anyone (we don't like) and then invent specious reasons why it means "it's all connected, man"

    That said, some networks of shorter paths than 6 are interesting. Right now, there's a 1:1 direct path from these documents to a bunch of people with an interest in confounding what evidentiary value they have in justice processes. That's more interesting to me, than what the documents say right now.

  • johongo 11 hours ago

    Emil Eifrem (founder of Neo4j) has a talk about them doing this with the Panama papers

  • Centigonal 9 hours ago
    • scotty79 4 hours ago

      300 categories, 60 attributes ... Doesn't sound very high res.

    • pbronez 4 hours ago

      Yup, this is a fantastic project and probably the most mature attempt at a global knowledge graph for contemporary news.

  • j-pb 13 hours ago

    If it's RDF it won't work as the foundation.

  • axus 13 hours ago

    One wonders what the US government agencies use.

    • cjohnson318 13 hours ago

      They probably use Excel, maybe Microsoft Access.

      • ToucanLoucan 12 hours ago

        Microsoft Access form that connects via IIS to an Excel spreadsheet acting as a database. Also the server it's running on is sitting on a wooden table.

    • abnercoimbre 13 hours ago

      I think you meant one shudders. And yeah, Snowden made it clear there's orders of magnitude more data than this graph explorer for them to sift through.

    • dboreham 12 hours ago

      Internet search engines have their origins in government projects fwiw. They had search engines before Alta Vista, used for searching data sets that pre-date the internet, and some of the people involved in those went to work on the original commercial search engines.

ChrisMarshallNY 14 hours ago

Oh Cthulhu, this is like a periscope into a septic tank...

  • bamboozled 12 hours ago

    Yes almost no one has been held accountable for any of it, "weird"?

    • rich_sasha 8 hours ago

      As "The Rest Is Politics" podcasts points out, the meagre consequences mostly came to Brits: Ghislaine Maxwell, Prince Andrew aka Andrew Mountbatten, and the former UK embassador to the US.

      Americans..?

      • scotty79 4 hours ago

        Americans don't really do accountability all that much. People there who get to face the consequences are usually the ones that significantly harmed the financial interests of the very rich. Madoff, Holmes, Bankman. They operate more on a vengance than accountability system.

    • Y_Y 9 hours ago

      What accountability would you suggest?

      • names_are_hard an hour ago

        Eternal shame and public oppobrium. At minimum, elected officials connected with impropriety should step down, and the public should be so disgusted that they have no hope of ever serving in public office again.

      • octoberfranklin 9 hours ago

        Prison?

        • gruez 8 hours ago

          We're going to send people to jail based purely on hearsay from Epstein or his affiliates?

          • wredcoll 7 hours ago

            What about... investigations...

            • gruez 6 hours ago

              If the evidence is strong enough, sure. But as much as I like a "the elites are pedophiles" witchhunt, given that the Biden administration sat on it, it's probably safe to conclude the evidence isn't great. The Trump administration is trying to get another wack at it, but given their recent history of investigations, it's probably safe to conclude that's purely politically motivated than some cold case that got cracked.

boogheta 11 hours ago

It's a bit too bad that the network visualisation relies on d3: it is really slow with big networks, and the force directed algorithm is far from the best. Have you tried using JS libraries built specifically to visualise graph networks such as Sigma.js, Vivagraph or Cytoscape?

  • tootyskooty 11 hours ago

    Shameless plug: if OP is looking to stay on d3, he could also try slotting in my C++/WASM versions[1] of the main d3 many-body forces. Not the best, but I've found >3x speedup using these for periplus.app :)

    [^1]: https://www.npmjs.com/package/d3-manybody-wasm

jrochkind1 14 hours ago

Why are they all moving, what does the time axis represent?

  • alhadrad 14 hours ago

    Its because the layout system has also a physics system.

  • piyh 14 hours ago

    >A force-directed graph is a technique for visualizing networks where nodes are treated like physical objects with forces acting between them to create a stable arrangement. Attractive forces (like springs) pull connected nodes together, while repulsive forces (like electric charges) push all nodes apart, resulting in a layout where connected nodes are closer and unconnected nodes are more separated

    https://observablehq.com/@d3/force-directed-graph/2

    • oskarkk 5 hours ago

      I think it would be better and faster if the website calculated the positions of the nodes in the background (with a good enough limit of iterations), and then showed the result. Animating 4k nodes and 25k edges (15k by default) is a waste of CPU and is laggy even on my high-end CPU. But maybe the author was limited by the tools used.

Ms-J 7 hours ago

This is great work to show relationship and connections. The government gets scared from these types of efforts as there are many members who are extremely guilty of crimes related to this and others.

We need to expand on network mapping with data and areas as well.

liotier 14 hours ago

"Brad Edwards" and "Bradley Edwards" might be the same individual.

  • DrewADesign 13 hours ago

    I’m sure some developer/archivist is working on a name authority as we speak.

  • tovej 13 hours ago

    Yes, the dataset also has three entries for Virginia Giuffre, "Virginia L. Giuffre", "Virginia Roberts Giuffre", and "Jane Doe Number 3 (Virginia Roberts)"

  • cyrusradfar 12 hours ago

    great use case for using AI to suggest mergers and clean up.

    • specproc 12 hours ago

      LLMs are awful for this. I've got a project that's doing structured extraction and half the work is deduplication.

      I didn't go down the route of LLMs for the clean up, as you're getting into scale and context issues with larger datasets.

      I got into semantic similarity networks for this use case. You can do efficient pairwise matching with Annoy, set a cutoff threshold, and your isolated subgraphs are merger candidates.

      I wrapped up my code in a little library if you're into this sort of thing.

      github.com/specialprocedures/semnet

  • adolph 11 hours ago

    I read a recent observation that people subject to discovery are often making purposeful typos in key names in order for the communication to remain under the radar.

  • GuinansEyebrows 13 hours ago

    Likewise for instances of "Larry" and "Lawrence" Summers... probably a lot of those.

bfkwlfkjf 9 hours ago

Anybody else enjoying the fact that maga manufactured this outrage and now it's being turned against them?

  • Danjoe4 3 hours ago

    If you look at this graph and your prescient thought is "haha take that MAGA" then you are a brainwashed ideologue. This graph gives a window into the layers of rot in our political system. The complexity is perfectly represented by its form but it seems like your graph is just a big arrow that says "orange man bad".

yndoendo 12 hours ago

After seen this I interested in a map of each person to assist with knowing who they are, who they worked for during the email date, and who they currently work for.

Bender 8 hours ago

Trump has reversed course [1] "Trump reverses on Epstein files, says he’d sign bill calling for their release"

[1] - https://www.youtube.com/watch?v=XcebHfZ2LbU [video][4 mins]

  • nofriend 8 hours ago

    Given how strongly he was against it, this is pretty clearly a ploy. He could release them unilaterally if he hadn't just reopened the investigation which he himself shut down.

    • Bender 8 hours ago

      Yeah I have not a clue what is going on behind the scenes or why the previous admin did not release them.

      • walls 7 hours ago

        They were sealed at the time, and the admin was following the law.

    • wredcoll 7 hours ago

      Famously he asked the fbi to redact his name.

theultdev 13 hours ago

This is the best rendition I've seen so far.

The Bill Clinton entity is interesting.

> 2009: Bill Clinton discontinued association with Jeffrey Epstein

> 2010: Jeffrey Epstein provided flights on jets to Bill Clinton

> 2010-2011: Jeffrey Epstein traveled via private aircraft with Bill Clinton

> 2011: Ghislaine Maxwell piloted helicopter for Bill Clinton

> 2014: Bill Clinton alleged presence at sex parties

> 2015: Bill Clinton distanced relationship from Jeffrey Epstein

Wasn't very good at discontinuing the relationship it seems.

Guess there is precedent for him lying about sexual activities though.

I think a sentiment analysis between the friendliness and social meetups between Epstein and other individuals would be useful.

Who were his friends after 2008 when he was first convicted?

Those who were still friends with him after 2008 were in on it or guilty by association, if not legally, socially.

Friends like Reid Hoffman and Larry Summers...

> From: Reid Hoffman

> Sent: 7/6/2015 5:04:31 PM

> To: jeffrey E. [jeeyacation@gmail.com]

> Subject: RE: ICYMI

> slow progress.

> planning to see you in August.

> Hope you're well.

Larry Summers has too many to list. Doesn't look good though digging through them.

  • beepbooptheory 13 hours ago

    This obviously the correct lens but note that the 2008 plea deal was so neutered by the time of settlement it made it somewhat easy to stay friends with him.

    This is of course ontop of the 2006 Florida prostitution charge though.

    • theultdev 12 hours ago

      Especially when Epstein was paying off journalists at the NYT and intimidating other outlets.

      But point being those people that were friends with him had to know. Whether it was socially acceptable by the elite because the public wasn't aware isn't very relevant.

  • octoberfranklin 9 hours ago

    > Wasn't very good at discontinuing the relationship it seems.

    Keep in mind that those summaries are AI-generated. There's gonna be a lot of confabulating in there.

    • theultdev 8 hours ago

      Yes, but the the summaries generated are referenced with sources.

      Care to dispute the summaries using the sources?

      • godelski 6 hours ago

        I read the gp as saying you should just check the sources, not defending.

        I mean here's a weird example. Searching Donald Trump there's the headline

          (1994-06 Wexner Mansion NYC) 
          Donald Trump forced to perform oral sex and physically abused 13-year-old female plaintiff and 12-year-old female. 
        
        Like that sounds weird... DT forced to rape? That doesn't make sense to me. The longer summary reads

          A declaration from Tiffany Doe (pseudonym) testifying that she witnessed Jeffrey Epstein and Donald Trump sexually abuse a 13-year-old girl and other minors during parties from 1990-2000 in New York City. 
        
        It references House Oversight 025937. The actual document looks much more like that summary. Here's a snippet

          7. It was at these series of parties that I personally witnessed the Plaintiff being forced to perform various sexual acts with Donald J. Trump and Mr. Epstein. Both Mr. Trump and Mr. Epstein were advised that she was 13 years old.
        
        It gets worse so if you want to look further it's Case 1:16-cv-04642 Document 1-2 Filed 06/20/16 Page 1 of 2.

        So far the paragraph summaries seem to be accurate in my poking around but the headlines are mixing ordering and have other weird errors like this. Anyways, always good to check when things are as serious as this...

        Here's the link: https://drive.google.com/file/d/11KzAOYCjxwEhnyrsiBpKM8OGBJp...

        • rayiner 3 hours ago

          Note that this document is from an anonymous lawsuit that was withdrawn and never substantiated or corroborated.

  • tinyplanets 13 hours ago

    I'd take a look at Trump. He's on a whole different level. Lots of rape and sexual abuse of minors... wow.

    • bamboozled 12 hours ago

      Seems to get away with it all, meanwhile, we all pay our taxes, don't break any laws and just be "good people".

    • theultdev 13 hours ago

      Of course, deflect discussion to Trump. Does that make any of those other people look better to you?

      Trump gave information against Epstein in 2009 and unlike Bill and others did cut ties after learning he was poaching girls from Mar-a-Lago.

      I specifically made the point to look into those who were friends with Epstein even after knowing what he was doing.

      Nice whataboutism though. Feel free to reference source materials to support your claims.

      Btw are you a bot or is that just a canned statement you use?

      • hiccuphippo 11 hours ago

        Well for one those other people are not the current president of the most powerful country in the world.

        But sure, lock all of them up, just don't ignore a few because they are too powerful.

        • JumpCrisscross 10 hours ago

          It’s been wild to see people subsume not defending child rapists to their partisan identity.

          I’m still convinced it’s a minority of loud voices online and on social media.

          • rayiner 4 hours ago

            It’s wild to see people who fell for the pee tapes still around almost a decade later insisting that, this time, the latest accusations will be supported by evidence rather than Glenn Beck style dot connecting.

            I admit I fell for Russiagate. I even voted for Biden in 2020. But I learned my lesson. What’s your excuse? Does Trump’s view on immigration upset you so much you’re willing to continue trusting a media that has done nothing but lie about him for a decade?

            As far as I can tell, what fundamentally differentiates people who simply don’t like Trump for all the legitimate reasons to dislike Trump from the people who go full blown Rachel Maddow is deep-seated liberal universalism. For most people, Trump is merely a shady character and a serial liar. That’s why my dad hates Trump. But for others, he attacks the core of their worldview. Those folks will gobble up any shred of innuendo no matter how far-fetched and no matter how discredited the source.

            • ben_w a minute ago

              You're proving the point here.

              You don't need to trust the media or care about his views on immigration to know that the guy got impeached twice, that he got 34 felony convictions, that he's lost lawsuits regarding sexual assault claims, and that sexual assault claims against him go back to the 70s and involving at least 28 women and him walking in on naked teenage pageant contestants.

              The possibility of pee tapes was funny, but did anyone really care if golden shower was a liquid reference or a "24 carat (plated)" like his redecoration of the oval office?

        • theultdev 8 hours ago

          Sure. What evidence would you like to use to lock Trump up?

          Point to an email in this dump or anything else.

          It's clear as day Trump cut ties when he found out who he was and was against him.

          Not so much for others.

          • wredcoll 7 hours ago

            > cut ties after learning he was poaching girls from Mar-a-Lago.

            This is, uh, not the slam dunk you seem to think it is.

      • crystal_revenge 9 hours ago

        What I don't understand is the pretense of defending Trump at all. I mean, it's clear that even if you watched Trump assault a 13 year old with your own eyes, it wouldn't impact your support for him. Why pretend that there is some moral divide between Bill Clinton and Donald Trump in this when you can just say "I support Donald Trump no matter what, and despite Bill Clinton no matter what"?

        Personally I've never been shocked that some of the most powerful people in the world like to go to a private sex-island where they could do as they pleased. That's precisely the incentive to becoming so incredibly powerful in the first place: to be able to pursue personal gain with increasingly less consequences.

      • toyg 11 hours ago

        Clinton at least has not been in office for 25 years. Trump is still in office. Surely the priority should be to get the bad people out of institutions asap...?

      • protocolture 9 hours ago

        >Of course, deflect discussion to Trump

        Interesting attempted deflection away from Trump.

      • timeon 11 hours ago

        > Trump gave information against Epstein in 2009

        Pre-2009 records on Trump there are nasty. One example:

        > ... It was at these series of parties that I personally witnessed the Plaintiff being forced to perform various sexual acts with Donald J. Trump and Mr. Epstein. Both Mr. Trump and Mr. Epstein were advised that she was 13 years old. I personally witnessed four sexual encounters that the Plaintiff was forced to have with Mr. Trump during this period, including the fourth of these encounters where Mr. Trump forcibly raped her despite her pleas to stop.

        Only difference between Clinton and Trump is that Trump is still president.

      • sanktanglia 12 hours ago

        Trump was with Epstein in 2017, he didn't cut ties at all

        • theultdev 12 hours ago

          That's a lie that has already been proven false since Trump's entire trip was documented.

          Love how we have actual evidence against people but discussions always devolve into some conspiracy related to Trump.

          ----

          Based on the available evidence, there is no confirmed meeting between Trump and Epstein in 2017. While both men were in Palm Beach during Thanksgiving week 2017, there is no direct evidence they met.

          Here's what we know about their presence in Palm Beach that week:

          - Trump was at Mar-a-Lago from November 21-26, 2017

          - Epstein owned a mansion in Palm Beach and was known to be in the area

          - Epstein mentioned both Trump and himself being "down there" (Palm Beach) in an email exchange on November 23, 2017

          While there were claims circulating online that Trump spent Thanksgiving with Epstein in 2017, these claims have been thoroughly investigated and found to be unsubstantiated

          Trump's official calendar for that week shows his activities included:

          - Thanking military members on a virtual call

          - Visiting Coast Guard members at Lake Worth Inlet Station

          - Playing golf with Tiger Woods and Dustin Johnson

          • protocolture 9 hours ago

            >Trump's official calendar for that week shows his activities included:

            Damn, Trump would have 100% listed his sex crimes on his official calendar. Case closed.

            • theultdev 8 hours ago

              Yeah it's most likely he snuck out from the SS and had thanksgiving with a pedo while president. /s

              No reason to talk about anyone who actually corresponded with Epstein I guess.

              • protocolture 7 hours ago

                Plenty of reason. Nab everyone. But deflecting criticism from the pedo in chief is a weird look.

          • phatfish 10 hours ago

            Thanks ChatGPT.

            • theultdev 8 hours ago

              This entire thread is about AI generated content from emails.

              But we are human, so we can verify sources collected by AI. Care to dispute anything?

          • culi 11 hours ago

            Even if Trump cut off ties with Epstein in 2017, he should clearly be held accountable for his past actions. Here's 2 pretty damning emails:

            ---

            Epstein to Maxwell 2011-04-02

            > i want you to realize that that dog that hasn’t barked is trump... [VICTIM] spent hours at my house with him ,, he has never once been mentioned. police chief. etc. im 75% there

            ---

            Epstein to Ruemmler 2018-08-23

            > you see, i know how dirty donald is. my guess is that non lawyers ny biz people have no idea. what it means to have your fixer flip

            • gruez 8 hours ago

              >Here's 2 pretty damning emails:

              The most "damning" emails are hearsay from other people?

            • theultdev 11 hours ago

              [flagged]

              • pohl 9 hours ago

                Could you explain how “no confirmed meeting” implies “they never met”?

                • theultdev 8 hours ago

                  You think he snuck out from secret service and had an off-the-book thanksgiving with a pedo?

                  I'm saying there's no direct evidence he did and on face value it's ridiculous.

                  He was meeting the troops and golfing with Tiger Woods and happened to be in the same state Epstein had a house in.

                  Have any evidence otherwise, or just conspiracy theories?

  • anonnon 6 hours ago

    > The Bill Clinton entity is interesting.

    Not really. After Epstein got convicted in 2008, he set about trying to rehabilitate his image, to be seen as a philanthropist, a patron of science, and (perversely) a supporter of women and girls. He hired reputation management consultants to help carry out the project, with one of the models they used being Mike Milken (of Drexel infamy), who ultimately secured a pardon from Trump. A lot of prominent people, knowingly or not, served as "useful idiots" in this project, often due to financial incentives that were not wholly selfish. For example, the MIT and Harvard scientists whose labs and research he funded, and who visited his island for science-themed retreats. Clinton was probably another of Epstein's useful idiots, being lured in through his Clinton Global Initiative and the promise that Epstein, with his ample wealth, could help greatly expand it.

    • anonnon 2 hours ago

      > For example, the MIT and Harvard scientists whose labs and research he funded, and who visited his island for science-themed retreats.

      I should add that at least one of them, Marvin Minsky, was accused by name by the late Virginia Giuffre.

wnevets 15 hours ago

where is bubba?

  • analog31 15 hours ago

    Retired from public office.

    • trallnag 14 hours ago

      [flagged]

      • deelowe 14 hours ago

        Bubba was allegedly a nickname for clinton.

        • JKCalhoun 14 hours ago

          (Also allegedly the name of a horse Ghislaine Maxwell owned.)

        • wnevets 14 hours ago

          The nickname itself isn't alleged, which particular bubba is tho.