Skip to main content


My current jq project: create a Diaspora post-abstracter


Given the lack of a search utility on Diaspora*, my evolved strategy has been to create an index or curation of posts, generally with a short summary consisting of the title, a brief summary (usually the first paragraph), the date, and the URL.

I'd like to group these by time segment, say, by month, quarter, or year (probably quarter/year).

And as I'm writing this, I'm thinking that it might be handy to indicate some measure of interactions --- comments, reshares, likes, etc.

My tools for developing this would be my Diaspora* profile data extract, and jq, the JSON query tool.

It's possible to do some basic extraction and conversion pretty easily. Going from there to a more polished output is ... more complicated.
A typical original post might look like this, (excluding the subscribed_pods_uris array):
{
  "entity_type": "status_message",
  "entity_data": {
    "author": "dredmorbius@joindiaspora.com",
    "guid": "cc046b1e71fb043d",
    "created_at": "2012-05-17T19:33:50Z",
    "public": true,
    "text": "Hey everyone, I'm #NewHere. I'm interested in #debian and #linux, among other things. Thanks for the invite, Atanas Entchev!\r\n\r\nYet another G+ refuge.",
    "photos": []
  }
}

Key points here are:
  • entity_type: Values "status_message" or "reshare".
  • author: This is the user_id of the author, yours truly (in this case in my DiasporaCom incarnation).
  • guid: Can be used to construct a URL in the form of https://<hostname>/posts/<guid>
  • created_at: The original posting date, in UTC ("Zulu" time).
  • public: Status, values true, false. Also apparently missing in a significant number of posts.
  • text: The post text itself.
A reshare looks like:
{
  "entity_type": "reshare",
  "entity_data": {
    "author": "dredmorbius@joindiaspora.com",
    "guid": "5bfac2041ff20567",
    "created_at": "2013-12-15T12:45:08Z",
    "root_author": "willhill@joindiaspora.com",
    "root_guid": "53e457fd80e73bca"
  }
}

Again, excluding the .subscribed_pods_uris. In most cases, reshares are of less interest than direc posts.

Interestingly, I've a pretty even split between posts and reshares (52% status_message, that is, post).

My theory in creating an abstract is:
  • Automation is good.
  • It's easier to peel stuff off an automatically-created abstract than to add bits back in manually.
  • The compilation should contain only public posts and exclude reshares.
Issues:
  • It's relatively easy to create a basic extract:
jq '.user.posts[].entity_data | .author, .guid, .created_at, text

Adding in selection and formatting logic gets ... more complicated.

Among other factors, jq is a very quirky language.

Desired Output Format


I would like to produce output which renders something like this for any given posts:
Diaspora Tips: Pods, Hashtags & Following
For the many Google Plus refugees showing up on Diaspora and Pluspora, some pointers: ...
https://diaspora.glasswings.com/posts/a53ac360ae53013611b60218b786018b (2018-10-10 00:45)
What if any options are there for running Federated social networking tools on or through # or related router systems on a single-user or household basis?
I'm trying to coordinate and gather information for # (and other) users looking to migrate to Fediverse platforms, and I'm aware that OpenWRT, # (I have a #, and several other router platforms can run services, mostly # that I'm aware. ...
https://diaspora.glasswings.com/posts/91f54380af58013612800218b786018b (2018-10-11 07:52)
The original posts can of course be viewed at the URLs shown.

What this is doing is:
  • Extracting the first line of the post text itself.
  • Stripping all formatting from it.
  • Bolding the result by surrounding it in ** Markdown.
  • Including the second paragraph, terminating it in an elipsis ....
  • Including a generated URL, based on the GUID, and here parked on Glasswings. (I might also create links to Archive.Today and Archive.Org of the original content.)
  • Including the post date, with time in YYYY-MM-DD hh:mm resolution.
Including the month and year where those change might also be useful for creating archives.

Specific questions / challenges:

  • How to conditionally export only public posts.
  • How to conditionally export only status_message (that is, original) posts, rather than reshares.
  • How to create lagged "oldYear" and "oldMonth" variables.
  • How to conditionally output content when computed Month and Year values > oldMonth and oldYear respectively. Goal is to create ## .year and ### .month segments in output.
  • How to output up to two paragraphs, where posts may consist of fewer than two separate text lines, and lines may be separated by multiple or only single linefeeds \r\n.
  • Collect and output hashtags used in the post.
  • Include counts of comments, reshares, likes, etc. I'm not even sure this is included in the JSON output.
There might be more, but that's a good start.

And of course, if I have to invoke other tools for part of the formatting, that's an option, though an all-in-jq solution would be handy.

# # # # #
in reply to Doc Edward Morbius

I'm sad. I never made up my mind to go through this as well, instead this is the last post on pluspora probably. :|
in reply to Doc Edward Morbius

@DEFUNCT Carsten Raddatz (劉愷恩) -> now at nerdica @Carsten Raddatz Most of your posts should be accessible through other federated instances. Diasp.org most especially. Though it seems to be that you need to have the original URL or GUID to do this. If you exported your data you should still be able to do this.

I'd have said "joindiaspora.com", as I'd been on that and followed you from there, but ... well, it's also dead.

@Isaac Kuo also turned up another instance, I believe Socialhome (I need to go looking for that) which is both well-federated and searchable from other instances.

For shins'n'grits:Sadly, attempting to view a user's profile from a third-party pod does not seem to work. That would be one way to grab a fully-federated instance.

Now to track down Isaac's find again...
in reply to Doc Edward Morbius

@DEFUNCT Carsten Raddatz (劉愷恩) -> now at nerdica @Carsten Raddatz Socialhome.network:

https://socialhome.network

The user reference hash is different, but you can find it by searching for a specific handle.

Looking for carstenraddatz@pluspora.com gives:

https://socialhome.network/p/c05bd2c3-426f-4b94-bb0e-28456b36760e/

I'm not aware of a way of automatically scraping / requesting that content, but you should be able to manually page through posts.

The posts are not referenced by the Diaspora* GUID. E.g.:

https://socialhome.network/content/10897028/weil-escher-escher-visualization-art-infi/

Let me take a deeper look at that...

.... I'm not seeing an obvious way to grab a JSON abstract of either profiles or posts.

There's source at GitLab: https://git.feneas.org/socialhome/socialhome

Or @Jason Robinson 🐍🍻 might be able to offer some pointers.