@Gargron What do you think about the idea of #mastadon servers supporting #atproto in addition to #activitypub?

By the way, thank you very much for your work and vision!

Regarding #Bluesky opening signups, one person on HN comments:

> The account portability is probably the biggest problem with the #fediverse right now.

And while that may be true from an end-user perspective, imho it is not the biggest problem. The fact that for a new dev building an #ActivityPub app is like a Viking trying to discover America is problematic. While #ATProto has the Holland-America line where you can check in for the ride.

@maegul
Right. Portability is a red herring though. Its effectively meaningless - who cares if i can move my website host. Who cares if its technically possible to block google from crawling me if everyone i know can only find me through that. Worse, if they can only find me through something that doesnt tell anyone where it gets its websites from. What benevolent organization has the resources to crawl everything without ulterior motive, and again how would i even know? If we organize a mass movement away from the main relay because its downright dangerous to have everything public, we would also need to make that relay break the protocol to be private access to similarly safe feed generators and app views, host our own PLC resolver - breaking the chain that makes migration possible, even if i still technically hold the git repo with the right hash, they own the means of finding it.

It all seems nice until you think about how any of it would actually play out. It would be cool to be wrong, id love to see a web utopia spring up from a platform billionaires pocket change, it would be such sweet irony. but seeing the same focus on account portability trotted out again and again without answering any of the hard questions keeps demonstrating how shallow this pipe dream really is. Moving accounts on the fedi is too hard, true, but to still think thats the main barrier and the answer being to give me literally only that and my ability to shop for a new algorithmic slop bucket as agency is just demonstrably wrong the second you think twice about it.

App views invert identity, and thats sort of interesting for a minute - instead of me subscribing to a platform, the platform subscribes to me. Except wait wasnt a series of platforms that mine all my data to serve me targeted shit the whole problem in the first place? Why is it compelling to require any indie app view to require me to surrender literally all my data to something they literally and uncritically call the firehose to be viable? Even if the indie app view is beautiful and lovely, it structurally depends on Sidechannel Information Leaks As A Service where now instead of a single platform having to win my trust to harvest my data, to merely exist online i literally must allow literally everyone to harvest it so i can have the privilege of them telling me what to look at.

They talk about how many thousands of algorithms have been created, but take a look at the list of them, it shows you how many people subscribe to them, sorted by popularity, descending. Compare that to the number of accounts on bluesky. Try and find the source code for the most popular feeds. If you can, are they doing anything more interesting than a manually curated list of accounts or posts that contain a hashtag? Try and find an algo that does actual account-level recommendation. Are there any? Youll find a handful of open source ones that have shut down. Try and run one. What happens? Who has the resources to run something like that? For how many accounts? Sample the firehose for a week, including all your feeds. Compare the number of accounts and posts that show up there to the number of accounts active in the firehose. If it really worked, youd expect to see a lot of different algorithms serving a lot of different posts from a lot of different people to a lot of different people. Evaluating how far that is from reality is left as an exercise to the reader.

And thats the network working as designed - saying nothing about the intrinsic impossibility of safety given the identity model.

I find very little of it compelling. I was interested in lexicons until i saw how they are used in practice as a wildly ineffective and repetitive API spec. I was interested in the use of DNS and DIDs for lightweight identity beacons until i saw how almost a year (?) later there are no viable plans to replace PLC with anything but Identity As A Service. I was interested in the PDS as a signed data store until i saw how it was only discoverable and useful by being crawled by a single public relay. Its a hodgepodge of half finished ideas that could be cool if any critical feedback would actually affect the design.

As i always say when i talk about this, id love to be wrong, and if it turns out to be really cool ill happily join.

Its this part for me that tells the whole story: #atproto was designed for a new kind of advertising market, and when their VC money and puttering domain registration revenue streams dry up, control over the main firehose relay is a big gaping profit vector waiting to be capitalized on.

Checking in on whether #bluesky / #atproto has become any more like a communication medium, and... nope. almost unchanged since i looked at it last in June. Bluesky is a spectator platform where a small number of accounts receive most of the visibility and smaller accounts are effectively invisible. The introduction of new feed algorithms (to the degree that happened, there aren't really many that I can find in wide use) did not change that. This is a non-normative analysis: in some cases, it is good to have a medium that promotes some very small number of posts and accounts, eg. to surface singular events, etc.

From a 25h sample of the firehose...
- 600k posts, 2.4m likes, 250k boosts, 350k follows
- 40% of posts receive 0 likes, 70% receive <= 1
- accounts in the 99th percentile of likes received 44% of likes, accounts in the 95th percentile received 74%
- 40% of posts were from accounts within the top 95th percentile of accounts by likes received.
- the maximum number of likes for a post by an account not in the top 95% is 32.

The first plot below shows the cumulative sum of likes received on the y axis against each account in the sample on the x axis - this includes accounts that didnt' post during the sample (but would still have posts that could be liked, so this also shows the extreme recency bias). The second plot is a hockeystick showing the number of likes (*not* cumulative sum) received on the y axis per post on the x axis.

For background, the default algorithm only cares about likes, boosts don't matter, which is why i am calculating things by likes here - they are the primary algorithmic signal.

These are the same calculations that I did back in June, but this time i'm leaving the firehose open to do a longer sample to be able to parse momentary virality from persistent effects.

edit: more on "where is the fedi comparison" and "why is it like this"

@NicoleCRust
started writing this and it sorta got away from me, sry is long:

working on this now, part of working on diy algorithms, making some stuff to be able to backfill (respectfully) data from fediverse instances. problem is any given instance only knows about a subset of interactions on remote posts in the same way it only knows about a subset of replies/posts.

That's a systematic undercount and a pretty bad one at that, so you need to go and backfill data from the original hosting server post by post to get an accurate count, which takes time esp if you want to do it respectfully (eg. only ask once for the data, which means delaying the query for ~a day or so until those number stabilize), and i want to be careful to respect fedi norms so am doing stuff like scanning bios for nobot/noindex tags, all of that is pretty slow going and it's just a curiosity project for me. I'm also looking into what i can get from relays/what the view look likes from mastodon.social and if i can just use that as a quick reference.

contrast that to bsky where literally every action is funneled out of a single point as a purposeful part of its design, which makes analyses like these easier, but has other maybe undesirable consequences.

I had not thought about trying any additional platforms tho, twitter's API is closed, also not really interested in threads and not sure about their API, but it could be done.

social networks famously have power-law-ish degree distributions, so the null expectation wouldn't be a flat distribution, and that's not even really desirable from a small world network pov. Bluesky's just seems to be remarkably skewed even without a null to compare to though - if you aren't in the top 95% of all accounts, the max number of likes was 36. For that number to not have really moved while the network has really expanded in size and volume in the 6 months since i last measured is especially interesting/odd. You'd think that the long tail would "fatten out" as there are more people in smaller clusters, but none of the algos i can find are designed to really do that.

It seems to be the byproduct of a number of decisions - first and most obvious, the default algorithm sorts primarily (formerly, exclusively) on number of interactions with exponential time decay, so that of course creates small number of popular posts/virality. second and increasingly obvious, there is an explicit 'like floor' in default settings for the 'following' feed, and another in the default 'discover' feed, and so the only way a post can gain visibility in the first place is to have a large number of followers with that floor turned off. third, quote tweets have a sort of well known canalizing effect: the main character syndrome is also the monodiscourse syndrome, as posts will link back to again a small number of already-popular posts, increasing their popularity, etc.

tbc i'm not opposed in principle to any of these technologies or ideas - i think we should have transparent, private algos here, quote posts are fine to me, etc. but it does seem that through either careless or (who knows) purposeful design that bsky has nudged itself in the direction of skewed interaction distributions.

Theoretically their system of algorithmic feeds could also address this - that was the goal, for each person to be able to see their own custom feed - but a) defaults are powerful, and b) custom algorithms are expensive to compute at scale across an entire network's worth of posts, so the only ones that you see that even try are ones that do really trivial things that can be fit into SQL queries like 'rank by number of my following accounts that liked this thing'. unfortunately, since the system is designed for people to set up algorithms as a service (rather than for themselves), i think the only way that you see really good personalized algos there are when it becomes profitable to sell ads on them - who knows maybe someone will volunteer a ton of server time for free, but i doubt it.

Compare figure 3 here in the #atproto / #bluesky paper
https://bsky.social/about/bluesky-and-the-at-protocol-usable-decentralized-social-media-martin-kleppmann.pdf
To the diagram here:
https://bsky.social/about/blog/5-5-2023-federation-architecture

The paper figure is a lot cuter, but by linearizing it and presenting it as two parallel tracks they have obscured the most salient feature of the network: the big relay in the middle. Beyond "centralization bad," that pins down most of the undesirable and dangerous features of the protocol, and makes it seem like theres a lot more choice than there is.

Since the design purposefully hides the architecture: you dont know where your feed generators are drawing from, or those used by your friends. So you cant know what the effect of choosing a different relay would be, aka the main relay is always indispensable. Importantly the relays subscribe to you, you dont push to the relay, and since you arent really supposed to operate your own data store, you can be dropped from the network without knowing - the relay serves as an unaccountable point of moderation.

@jdp23 I don't see how partial relays would be possible in atproto. say some catastrophic event happens where people were dead set on splitting off from bsky the corporation. assume it's truly the top priority and nothing else goes until it happens. assume further still this is some unimaginable proportion of the userbase acting in concert - hell, say 25% want to go all at once. best case scenario for making an independent relay.

you create a new relay, migrate data to new PDSes, get that new relay to crawl the PDSes, so far so good. Now what tho? everyone on the new relay is invisible to everyone on the old relay and vice versa. you are back to 0 appviews and 0 feed generators because they all are listening to the main relay. every single appview and feed generator now needs to choose to listen to the new relay. but why would they? you're still responsible as an appview or feed generator for the content you distribute, and you don't know who this new relay is. that's assuming there's no ill will in such a massive split.

so you set up a new basic set of appviews and feed generators. do they also listen to the main relay? do you mirror the old relay in the new relay? do you let the old relay crawl the pdses too? if so, what was the point of the split? now you need to redesign all the existing appviews and feed generators in flight to deduplicate records, which is possible since they're content addressed, but i would doubt they're designed to handle multiple relays because none have existed before now.

what about DIDs? most of the existing infrastructure is designed to just use PLC, which is just a lookup table that bsky also owns. shoot. but we're saved by magic here, because remember there is no acrimony in this enormous network redefining split! So say bsky the corporation is kind enough to keep letting people register DIDs with PLC. we didn't quite make the clean break we were after, but hey it's only the fundamental ability to exist on the network that we were unable to leave behind, and we'll always be reliant on bsky's goodwill for that until someone makes a DID method that works and then we redesign all the appviews and feed generators again.

So now after all that... we're still invisible to most people on the main relay?! oh right because bsky the corporation also provides the default feeds, and despite the high numbers claimed in the press releases, alternate feeds are actually only sparsely used and as a rule very simple hashtag/account feeds because doing anything else is ridiculously expensive. Bluesky the appview is provided by bluesky the corporation, and that's what's actually fetching and hydrating the feeds for us anyway, so even if the feed generators swap over, we'd still be invisible to everyone still on bsky the app. More magic! bsky the appview chooses to crawl and hydrate our posts. We're pretty far from our initial intention of a clean break, but what choice do we have? Now we're partially viewable, some of the time, on some non-default feeds, and there's no way at all to tell within the interface which those are. All it took was totally redesigning most of the network and an enormous amount of goodwill.

What about labels? What about all the automated content moderation bsky the appview does like scanning images and etc? Who moderates? How? Who's paying for all this anyway? The new relay is bound to be extremely expensive - either it's too small and you don't have the critical mass to make any of the above happen, or it's very large and you run into exactly the same problems of scale that necessitate bsky the corporation to need seed funding and eventually make a revenue model on. Where on fedi people pay for servers and donate to their instance because it's a visible part of their experience with moderators they know and like, now all that labor is diffused among a bunch of anonymous service providers - this is by design! It was supposed to depersonalize the network and make it so everyone is just an interchangeable part that you can shop around between. What keeps people donating to the new PDSes, the new relay, the new appviews, the new feed generators? How would they even know how to do that?Meanwhile the network is continuing to tack on features with some combination of bsky corporation fiat, behind the scenes server magic, and so on, so the best we can hope for is partial compatibility and an always-inferior experience.

And that's just to get to 2 relays. what about 3? Remember how much people complained about how hard it was to find an instance? That's absolutely nothing to the combinatoric complexity of PDS * relay * feed generator * app view. How on earth will anyone know how to follow and talk to their friends? To see your friend's post, if they are not on the main relay, you need to get just the right combination of parameters. Even in this perfect scenario with unlimited resources, attention, goodwill, and organization, we couldn't even manage to make a clean break and still have to be reliant on bsky for basically the entire stack, at least partially.

So maybe some small, closed group could make subnetworks, and that is lovely! i'm glad that tech is out there. There's no such thing as privacy on those networks unless they redesign indigo, but hey it's a start! But that looks nothing like the interoperable paradise that's on the label.

In reality we don't get perfect conditions though, and so we'll get stuck at step one: new relay, zero appviews, zero feed generators, zero visibility, and zero people. Again I don't think alternate relays are possible with atproto -- if they were, then there would be no reason to invest $13 million dollars in bluesky.

Bsky raises $15m from Blockchain Capital, the VC's press release hints at what they're interested in:

https://www.blockchaincapital.com/blog/bluesky-13m-users-and-growing-our-investment-in-blueskys-re-imagined-social-network

Bluesky [is] designed to foster a new ecosystem of applications. [...] It is interoperable with existing internet protocols and blockchain-based systems, opening the door for a more connected, less siloed social experience. Since its launch in April 2023, over 100 clients have been built on the AT Protocol, and users have created more than 50,000 custom feeds. And the best part of it all? By building on top of the AT protocol, these developers have access to Bluesky’s 13M users worldwide.

The VC firm sees bsky and their ownership of the relay as being a potentially very lucrative chokepoint, where the users of bluesky are the asset to rent to platform developers who want "access" to them. I've written before how atproto's decentralization is effectively meaningless with the relay system, where it's decentralized in the same sense as google alerts is decentralized - sure you can host your own PDS, but it's only useful because the main relay crawls it, and then either bsky or someone else who (inevitably) pays for access can send it back to you.

edit: here's why i think the relay is a chokepoint and why there will never be a second: https://neuromatch.social/@jonny/113365406995624763

@jdp23 I don't see how partial relays would be possible in atproto. say some catastrophic event happens where people were dead set on splitting off from bsky the corporation. assume it's truly the top priority and nothing else goes until it happens. assume further still this is some unimaginable proportion of the userbase acting in concert - hell, say 25% want to go all at once. best case scenario for making an independent relay.

you create a new relay, migrate data to new PDSes, get that new relay to crawl the PDSes, so far so good. Now what tho? everyone on the new relay is invisible to everyone on the old relay and vice versa. you are back to 0 appviews and 0 feed generators because they all are listening to the main relay. every single appview and feed generator now needs to choose to listen to the new relay. but why would they? you're still responsible as an appview or feed generator for the content you distribute, and you don't know who this new relay is. that's assuming there's no ill will in such a massive split.

so you set up a new basic set of appviews and feed generators. do they also listen to the main relay? do you mirror the old relay in the new relay? do you let the old relay crawl the pdses too? if so, what was the point of the split? now you need to redesign all the existing appviews and feed generators in flight to deduplicate records, which is possible since they're content addressed, but i would doubt they're designed to handle multiple relays because none have existed before now.

what about DIDs? most of the existing infrastructure is designed to just use PLC, which is just a lookup table that bsky also owns. shoot. but we're saved by magic here, because remember there is no acrimony in this enormous network redefining split! So say bsky the corporation is kind enough to keep letting people register DIDs with PLC. we didn't quite make the clean break we were after, but hey it's only the fundamental ability to exist on the network that we were unable to leave behind, and we'll always be reliant on bsky's goodwill for that until someone makes a DID method that works and then we redesign all the appviews and feed generators again.

So now after all that... we're still invisible to most people on the main relay?! oh right because bsky the corporation also provides the default feeds, and despite the high numbers claimed in the press releases, alternate feeds are actually only sparsely used and as a rule very simple hashtag/account feeds because doing anything else is ridiculously expensive. Bluesky the appview is provided by bluesky the corporation, and that's what's actually fetching and hydrating the feeds for us anyway, so even if the feed generators swap over, we'd still be invisible to everyone still on bsky the app. More magic! bsky the appview chooses to crawl and hydrate our posts. We're pretty far from our initial intention of a clean break, but what choice do we have? Now we're partially viewable, some of the time, on some non-default feeds, and there's no way at all to tell within the interface which those are. All it took was totally redesigning most of the network and an enormous amount of goodwill.

What about labels? What about all the automated content moderation bsky the appview does like scanning images and etc? Who moderates? How? Who's paying for all this anyway? The new relay is bound to be extremely expensive - either it's too small and you don't have the critical mass to make any of the above happen, or it's very large and you run into exactly the same problems of scale that necessitate bsky the corporation to need seed funding and eventually make a revenue model on. Where on fedi people pay for servers and donate to their instance because it's a visible part of their experience with moderators they know and like, now all that labor is diffused among a bunch of anonymous service providers - this is by design! It was supposed to depersonalize the network and make it so everyone is just an interchangeable part that you can shop around between. What keeps people donating to the new PDSes, the new relay, the new appviews, the new feed generators? How would they even know how to do that?Meanwhile the network is continuing to tack on features with some combination of bsky corporation fiat, behind the scenes server magic, and so on, so the best we can hope for is partial compatibility and an always-inferior experience.

And that's just to get to 2 relays. what about 3? Remember how much people complained about how hard it was to find an instance? That's absolutely nothing to the combinatoric complexity of PDS * relay * feed generator * app view. How on earth will anyone know how to follow and talk to their friends? To see your friend's post, if they are not on the main relay, you need to get just the right combination of parameters. Even in this perfect scenario with unlimited resources, attention, goodwill, and organization, we couldn't even manage to make a clean break and still have to be reliant on bsky for basically the entire stack, at least partially.

So maybe some small, closed group could make subnetworks, and that is lovely! i'm glad that tech is out there. There's no such thing as privacy on those networks unless they redesign indigo, but hey it's a start! But that looks nothing like the interoperable paradise that's on the label.

In reality we don't get perfect conditions though, and so we'll get stuck at step one: new relay, zero appviews, zero feed generators, zero visibility, and zero people. Again I don't think alternate relays are possible with atproto -- if they were, then there would be no reason to invest $13 million dollars in bluesky.

#atproto #bsky #bluesky #fediverse

@luna fun fact i found out literally today, your did service entry not only needs to be exactly of type `AtprotoPersonalDataServer`, it also needs to have an exact id of `#atproto_pds` and your signing key / verificationMethod needs an exact id of `#atproto`. this is separate from the `#atproto_labeler` + `AtprotoLabeler` which needs a *separate* key with exact id of `#atproto_label`, even if it's the same key / key material as the `#atproto` key

https://atproto.com/specs/label#labeler-service-identity