Skip to main content

5 posts tagged with "updates"

View All Tags

@atproto/api v0.14.0 release notes

· 11 min read

Today we are excited to announce the availability of version 0.14 or our TypeScript SDK on npm.

This release is a big step forward, significantly improving the type safety of our @atproto/api package. Let’s take a look at the highlights:

  • Lexicon derived interfaces now have an explicitly defined $type property, allowing to properly discriminate unions.
  • Lexicon derived is* utility methods no longer unsafely type cast their input.
  • Lexicon derived validate* utility methods now return a more precise type.
  • A new lexicon derived isValid* utility methods are now available.

Context

Atproto is an "open protocol". This means a lot of things. One of these things is that the data structures handled through the protocol are extensible. Lexicons (which is the syntax used to define the schema of the data structures) can be used to describe placeholders where arbitrary data types (defined through third party Lexicons) can be used.

An example of such placeholder exists in the lexicon definition of a Bluesky post (app.bsky.feed.post). That schema defines that posts can have an embed property defined as follows:

  "embed": {
"type": "union",
"refs": [
"app.bsky.embed.images",
"app.bsky.embed.video",
"app.bsky.embed.external",
"app.bsky.embed.record",
"app.bsky.embed.recordWithMedia",
]
}

The type of the embed property is what is called an "open union". It means that the embed field can basically contain anything, though we usually expect it to be one of the known types defined in the refs array of the lexicon schema (an image, a video, a link or another post).

Systems consuming Bluesky posts need to be able to determine what type of embed they are dealing with. This is where the $type property comes in. This property allows to uniquely determine the lexicon schema that must be used to interpret the data. That field must be provided everywhere a union is expected. For example, a post with a video would look like this:

{
"text": "Hey, check this out!",
"createdAt": "2021-09-01T12:34:56Z",
"embed": {
"$type": "app.bsky.embed.video",
"video": { /* reference to the video file, omitted for brevity */ }
}
}

Since embed is an open union, it can be used to store anything. For example, a post with a calendar event embed could look like this:

{
"text": "Hey, check this out!",
"createdAt": "2021-09-01T12:34:56Z",
"embed": {
"$type": "com.example.calendar.event",
"eventName": "Party at my house",
"eventDate": "2021-09-01T12:34:56Z"
}
}

Note: Only systems that know about the com.example.calendar.event lexicon can interpret this data. The official Bluesky app will typically only know about the data types defined in the app.bsky lexicons.

Revamped TypeScript interfaces

In order to facilitate working with the Bluesky API, we provide TypeScript interfaces generated from the lexicons (using a tool called lex-cli). These interfaces are made available through the @atproto/api package.

For historical reasons, these generated types were missing the $type property. The interface for the app.bsky.embed.video, for example, used to look like this:

export interface Main {
video: BlobRef
captions?: Caption[]
alt?: string
aspectRatio?: AppBskyEmbedDefs.AspectRatio
[k: string]: unknown
}

Because the $type property is missing from that interface, developers could write invalid code, without getting an error from TypeScript:

import { AppBskyFeedPost } from '@atproto/api'

// Aliased for clarity
type BlueskyPost = AppBskyFeedPost.Main

// Invalid post, but TypeScript did not complain
const myPost: BlueskyPost = {
text: 'Hey, check this out!',
createdAt: '2021-09-01T12:34:56Z',
embed: {
// Notice how we are missing the `$type` property here
video: {
/* reference to the video file, omitted for brevity */
},
},
}

Similarly, because Bluesky post’s embed property was previously typed like this:

export interface Record {
// ...
embed?:
| AppBskyEmbedImages.Main
| AppBskyEmbedVideo.Main
| AppBskyEmbedExternal.Main
| AppBskyEmbedRecord.Main
| AppBskyEmbedRecordWithMedia.Main
| { $type: string; [k: string]: unknown }
}

It was possible to create a post with a completely invalid "video" embed, and still get no error from the type system:

import { AppBskyFeedPost } from '@atproto/api'

// Aliased for clarity
type BlueskyPost = AppBskyFeedPost.Main

const myPost: BlueskyPost = {
text: 'Hey, check this out!',
createdAt: '2021-09-01T12:34:56Z',
embed: {
$type: 'app.bsky.embed.video',
video: 43, // This is invalid, but TypeScript does not complain
},
}

We have fixed these issues by making the $type property in the generated interfaces explicit. The app.bsky.embed.video interface now looks like this:

export interface Main {
$type?: $Type<'app.bsky.embed.video', 'main'>
video: BlobRef
captions?: Caption[]
alt?: string
aspectRatio?: AppBskyEmbedDefs.AspectRatio
}

Notice how the $type property is defined as optional (?:) here. This is due to the fact that lexicons can define schemas that can be referenced from other places than open unions. In those places, there might not be any ambiguity as to how the data should be interpreted. For example, an embed that represents a "Record With Media" has a record property that expects an app.bsky.embed.record object:

export interface Main {
$type?: $Type<'app.bsky.embed.recordWithMedia', 'main'>
record: AppBskyEmbedRecord.Main // Also used in post's `embed` property
media: /* omitted */
}

Since there is no ambiguity as to the type of the data here, making the $type property required would cause unnecessary bloat. Making the $type property optional allows to declare a "Record With Media" as follows:

const recordWithMedia: RecordWithMedia = {
$type: 'app.bsky.embed.recordWithMedia',
record: {
// $type is not needed here
record: {
/* omitted */
},
},
media: {
/* omitted */
},
}

Because the $type property of objects is required in some contexts while optional in others, we introduced a new utility type to make it required when needed. The $Typed utility allows to mark an interface’s $type property non optional in contexts where it is required:

export type $Typed<V> = V & { $type: string }

That utility allows to mark an interface’s $type property non optional. The embed property of posts is now defined as follows:

export interface Record {
// ...
embed?:
| $Typed<AppBskyEmbedImages.Main>
| $Typed<AppBskyEmbedVideo.Main>
| $Typed<AppBskyEmbedExternal.Main>
| $Typed<AppBskyEmbedRecord.Main>
| $Typed<AppBskyEmbedRecordWithMedia.Main>
| { $type: string }
}

In addition to preventing the creation of invalid data as seen at the beginning of this section, this change also allows to properly discriminate types when accessing the data. For example, one can now do:

import { AppBskyFeedPost } from '@atproto/api'

// Aliased for clarity
type BlueskyPost = AppBskyFeedPost.Main

// Say we got some random post somehow (typically via an api call)
declare const post: BlueskyPost

// And we want to know what kind of embed it contains
const { embed } = post

// We can now use the `$type` property to disambiguate
if (embed?.$type === 'app.bsky.embed.images') {
// The `embed` variable is fully typed as `$Typed<AppBskyEmbedImages.Main>` here !
}

is* utility methods

The example above shows how data can be discriminated based on the $type property. There are, however, several disadvantages to relying on string comparison for discriminating data types:

  • Having to use inline strings yields a lot of code, hurting readability and bundle size.
  • In particular instances, the $type property can actually have two values to describe the same lexicon. An "images" embed, for example, can use both app.bsky.embed.images and app.bsky.embed.images#main as $type. This makes the previous point even worse.

In order to alleviate these issues, the SDK provides type checking predicate functions. In their previous implementation, the is* utilities were defined as follows:

export interface Main {
images: Image[]
[x: string]: unknown
}

export function isMain(value: unknown): values is Main {
return (
value != null &&
typeof value === 'object' &&
'$type' in value &&
(value.$type === 'app.bsky.embed.images' ||
value.$type === 'app.bsky.embed.images#main')
)
}

As can be seen from the example implementation above, the predicate functions would cast any object containing the expected $type property into the corresponding type, without checking for the actual validity of other properties. This could yield runtime errors that could have been avoided during development:

import { AppBskyEmbedImages } from '@atproto/api'

// Alias, for clarity
const isImages = AppBskyEmbedImages.isMain

// Get an invalid embed somehow
const invalidEmbed = {
$type: 'app.bsky.embed.images',
// notice how the `images` property is missing here
}

// This predicate function only checks the value of the `$type` property, making the condition "true" here
if (isImages(invalidEmbed)) {
// No TypeScript error, BUT causes a runtime error because there is no "images" property !
console.log('First image:', invalidEmbed.images[0])
}

The root of the issue here is that the is* utility methods perform type casting of objects solely based on the value of their $type property. There were basically two ways of fixing this issue:

  1. Alter the implementation to actually validate the object's structure. This is a non-breaking change that has a negative impact on performance.
  2. Alter the function signature to describe what the function actually does. This is a breaking change because TypeScript would start (rightfully) returning lots of errors in places where these functions are used.

Because this release introduces other breaking changes, and because adapting our own codebase to this change showed it made more sense, we decided to adopt the latter option.

In lots of cases where data needs to be discriminated, this change in the signature of the is* function will actually not cause any issue when upgrading the version of the SDK. This is the case for example when working with data obtained from the API. Because an API is a "contract" between a server and a client, the data returned by the server is "guaranteed" to be valid. In these cases, the is* utility methods provide a convenient way to discriminate between valid values.

import { AppBskyEmbedImages } from '@atproto/api'

// Aliased for clarity
const isImages = AppBskyEmbedImages.isMain

// Get a post from the API (the API's contract guarantees the validity of the data)
declare const post: BlueskyPost

// The `is*` utilities are an efficient way to discriminate **valid** data based on their `$type`
if (isImages(post.embed)) {
// `post.embed` is fully typed as `$Typed<AppBskyEmbedImages.Main>` here !
}

For other cases, when the data's validity is not known at dev time, we added new isValid* utility methods allowing to ensure that a value properly satisfies the interface.

import { AppBskyEmbedImages } from '@atproto/api'

// Aliased for clarity
const Images = AppBskyEmbedImages.Main
const isValidImages = AppBskyEmbedImages.isValidMain

// Get an embed with unknown validity somehow
declare const embed: unknown

// The following condition will be true if, and only if, the value matches the `Image` interface
if (isValidImages(embed)) {
// `embed` is of type `Images` here
}

These method perform data validation, making them somewhat slower than the is* utility methods. They can however be used in place of the is* utilities when migrating to this new version of the SDK.

validate* utility methods

As part of this update, the signature of the validate* utility methods was updated to properly describe the type of the value in case of success:

import { AppBskyEmbedImages } from '@atproto/api'

// Aliased for clarity
const Images = AppBskyEmbedImages.Main
const validateImages = AppBskyEmbedImages.validateMain

// Get some date somehow
declare const data: unknown

// Validate the data against a particular schema (images here)
const result = validateImages(data)

if (result.success) {
// The `value` property was previously typed as `unknown` and is now properly typed as `Image`
const images = result.value
}

Removal of the [x: string] index signature

Another property of Atproto being an "open protocol" is the fact that objects are allowed to contain additional, unspecified, properties (though this should be done with caution to avoid incompatibility with properties added in the future). This used to be represented in the type system using a [k: string]: unknown index signature in generated interfaces. This is how the video embed was represented:

export interface Main {
video: BlobRef
captions?: Caption[]
alt?: string
aspectRatio?: AppBskyEmbedDefs.AspectRatio
[k: string]: unknown
}

This signature allowed for undetectable mistakes to be performed:

import { AppBskyEmbedVideo } from '@atproto/api'

// Aliased for clarity
const Video = AppBskyEmbedVideo.Main

const embed: Video = {
$type: 'app.bsky.embed.video',
video: { /* omitted */ }
// Notice the typo in `alt`, not resulting in a TypeScript error
atl: 'My video'
}

We removed that signature, requiring any un-specified fields intentionally added to be now explicitly marked as such:

import { AppBskyEmbedVideo } from '@atproto/api'

// Aliased for clarity
const Video = AppBskyEmbedVideo.Main

const embed: Video = {
$type: 'app.bsky.embed.video',
video: { /* omitted */ }
// @ts-expect-error - custom field, prefixed to avoid clashes with future versions of the lexicon
comExampleCustomProp: 'custom value', // OK thanks to the "ts-expect-error" directive
}

2024 Protocol Roadmap

· 11 min read

Discuss this post in our Github Discussion forums here

This roadmap is an update on our progress and lays out our general goals and focus for the coming months. This document is written for developers working on atproto clients, implementations, and applications (including Bluesky-specific projects). This is not a product announcement: while some product features are hinted at, we aren't promising specific timelines here. As always, most Bluesky software is free and open source, and observant folks can follow along with our progress week by week in GitHub.

In the big picture, we made a lot of progress on the protocol in early 2024. We opened up federation on the production network, demonstrated account migration, specified and launched stackable moderation (labeling and Ozone), shared our plan for OAuth, specified a generic proxying mechanism, built a new API documentation website (docs.bsky.app), and more.

After this big push on the protocol, the Bluesky engineering team is spending a few months catching up on some long-requested features like GIFs, video, and DMs. At the same time, we do have a few "enabling" pieces of protocol work underway, and continue to make progress towards a milestone of protocol maturity and stability.

Summary-level notes:

  • Federation is now open: you don't need to pre-register in Discord any more. 
  • It is increasingly possible to build independent apps and integrations on atproto. One early example is https://whtwnd.com/, a blogging web app built on atproto.
  • The timeline for a formal standards body process is being pushed back until we have additional independent active projects building on the protocol.

Current Work

Proxying of Independent Lexicons: earlier this year we added a generic HTTP proxying mechanism, which allows clients to specify which onward service (eg, AppView) instance they want to communicate with. To date this has been limited to known Lexicons, but we will soon relax this restriction and make arbitrary XRPC query and procedure requests. Combined with allowing records with independent Lexicon schemas (now allowed), this finally enables building new independent atproto applications. PR for this work

Open Federation: the Bluesky Relay service initially required pre-registration before new PDS instances were crawled. This was a very informal process (using Discord) to prevent automated abuse, but we have removed this requirement, making it even easier to set up PDS instances. We will also bump the per-PDS account limits, though we will still enforce some limits to minimize automated abuse; these limits can be bumped for rapidly growing communities and projects.

Email 2FA: while OAuth is our main focus for improving account security (OAuth flows will enable arbitrary MFA, including passkeys, hardware tokens, authenticators, etc), we are rapidly rolling out a basic form of 2FA, using an emailed code in addition to account password for sign-in. This will be an optional opt-in functionality. Announcement with details

OAuth: we continue to make progress implementing our plan for OAuth. Ultimately this will completely replace the current account sign-up, session, and app-password API endpoints, though we will maintain backwards compatibility for a long period. With OAuth, account lifecycle, sign-in, and permission flows will be implementation-specific web views. This means that PDS implementations can add any sign-up screening or MFA methods they see fit, without needing support in the com.atproto.* Lexicons. Detailed Proposal

Product Features

These are not directly protocol-related, but are likely to impact many developers, so we wanted to give a heads up on these.

Harassment Mitigations: additional controls and mechanisms to reduce the prevalence, visibility, and impact of abusive mentions and replies, particularly coming from newly created single-purpose or throw-away accounts. May expand on the existing thread-gating and reply-gating functionality.

Post Embeds: the ability to embed Bluesky posts in external public websites. Including oEmbed support. This has already shipped! See embed.bsky.app

Basic "Off-Protocol" Direct Messages (DMs): having some mechanism to privately contact other Bluesky accounts is the most requested product feature. We looked closely at alternatives like linking to external services, re-using an existing protocol like Matrix, or rushing out on-protocol encrypted DMs, but ultimately decided to launch a basic centralized system to take the time pressure off our team and make our user community happy. We intend to iterate and fully support E2EE DMs as part of atproto itself, without a centralized service, and will take the time to get the user experience, security, and privacy polished. This will be a distinct part of the protocol from the repository abstraction, which is only used for public content.

Better GIF and Video support: the first step is improving embeds from external platforms (like Tenor for GIFs, and YouTube for video). Both the post-creation flow and embed-view experience will be improved.

Feed Interaction Metrics: feed services currently have no feedback on how users are interacting with the content that they curate. There is no way for users to tell specific feeds that they want to see more or less of certain kinds of content, or whether they have already seen content. We are adding a new endpoint for clients to submit behavior metrics to feed generators as a feedback mechanism. This feedback will be most useful for personalized feeds, and less useful for topic or community-oriented feeds. It also raises privacy and efficiency concerns, so sending of this metadata will both be controlled by clients (optional), and will require feed generator opt-in in the feed declaration record.

Topic/Community Feeds: one of the more common uses for feed generators is to categorize content by topic or community. These feeds are not personalized (they look the same to all users), are not particularly "algorithmic" (posts are either in the feed or not), and often have relatively clear inclusion criteria (though they may be additionally curated or filtered). We are exploring ways to make it easier to create, curate, and explore this type of feed.

User/Labeler Messaging: currently, independent moderators have no private mechanism to communicate with accounts which have reported content, or account which moderation actions have been taken against. All reports, including appeals, are uni-directional, and accounts have no record of the reports they have submitted. While Bluesky can send notification emails to accounts hosted on our own PDS instance, this does not work cross-provider with self-hosted PDS instances or independent labelers.

Protocol Stability Milestone

A lot of progress has been made in recent months on the parts of the protocol relevant to large-scale public conversation. The core concepts of autonomous identity (DIDs and handles), self-certifying data (repositories), content curation (feed generators), and stackable moderation (labelers) have now all been demonstrated on the live network.

While we will continue to make progress on additional objectives (see below), we feel we are approaching a milestone in development and stability of these components of the protocol. There are a few smaller tasks to resolve towards this milestone.

Takedowns: we have a written proposal for how content and account takedowns will work across different pieces of infrastructure in the network. Takedowns are a stronger intervention that complement the labeling system. Bluesky already has mechanisms to enact takedowns on our own infrastructure when needed, but there are some details of how inter-provider takedown requests are communicated.

Remaining Written Specifications: a few parts of the protocol have not been written up in the specifications at atproto.com.

Guidance on Building Apps and Integrations: while we hope the protocol will be adopted and built upon in unexpected ways, it would be helpful to have some basic pointers and advice on creating new applications and integrations. These will probably be informal tutorials and example code to start.

Account and Identity Firehose Events: while account and identity state are authoritatively managed across the DID, DNS, and PDS systems, it is efficient and helpful for changes to this state to be broadcast over the repository event stream ("firehose"). The semantics and behavior of the existing #identity event type will be updated and clarified, and an additional #account event type will be added to communicate PDS account deletion and takedown state to downstream services (Relay, and on to AppView, feed generator, labelers, etc). Downstream services might still need to resolve state from an authoritative source after being notified on the firehose.

Private Account Data Iteration: the app.bsky Lexicons currently include a preferences API, as well as some additional private state like mutes. The design of the current API is somewhat error-prone, difficult for independent developers to extend, and has unclear expectations around providing access to service providers (like independent AppViews). We are planning to iterate on this API, though it might not end up part of the near-term protocol milestone.

Protocol Tech Debt: there are a few other small technical issues to resolve or clean up; these are tracked in this GitHub discussion

On the Horizon

There are a few other pieces of protocol work which we are starting to plan out, but which are not currently scheduled to complete in 2024. It is very possible that priorities and schedules will be shuffled, but we mostly want to call these out as things we do want to complete, but will take a bit more time.

Protocol-Native DMs: as mentioned above, we want to have a "proper" DM solution as part of atproto, which is decentralized, E2EE, and follows modern security best practices.

Limited-Audience (Non-Public) Content: to start, we have prioritized the large-scale public conversation use cases in our protocol design, centered around the public data repository concept. While we support using the right tool for the job, and atproto is not trying to encompass every possible social modality, there are many situations and use-cases where having limited-audience content in the same overall application would be helpful. We intend to build a mechanism for group-private content sharing. It will likely be distinct from public data repositories and the Relay/firehose mechanism, but retain other parts of the protocol stack.

Firehose Bandwidth Efficiency: as the network grows, and the volume and rate of repository commits increases, the cost of subscribing to the entire Relay firehose increases. There are a number of ways to significantly improve bandwidth requirements: removing MST metadata for most use-cases; filtering by record types or subsets of accounts; batch compression; etc.

Record Versioning (Post Editing): atproto already supports updating records in repositories: one example is updating bsky profile records. And preparations were made early in the protocol design to support post editing while avoiding misleading edits. Ideally, it would also be possible to (optionally) keep old versions of records around in the repository, and allow referencing and accessing multiple versions of the same record.

PLC Transparency Log: we are exploring technical and organizational mechanisms to further de-centralize the DID PLC directory service. The most promising next step looks to be publishing a transparency log of all directory operations. This will make it easier for other organizations to audit the behavior of the directory and maintain verifiable replicas. The recent "tiling" transparency log design used for https://sunlight.dev/ (described here) is particularly promising. Compatibility with RFC 6962 (Certificate Transparency) could allow future integration with an existing ecosystem of witnesses and auditors.

Identity Key Self-Management UX: the DID PLC system has a concept of "rotation keys" to control the identity itself (in the form of the DID document). We would like to make it possible for users to optionally register additional keys on their personal devices, password managers, or hardware security keys. If done right, this should improve the resilience of the system and reduce some of the burden of responsibility on PDS operators. While this is technically possible today, it will require careful product design and security review to make this a safe and widely-adopted option.

Standards Body Timeline

As described in our 2023 Protocol Roadmap, we hope to bring atproto to an existing standards body to solidify governance and interoperability of the lower levels of the protocol. We had planned to start the formal process this summer, but as we talked to more people experienced with this process, we realized that we should wait until the design of the protocol has been explored by more developers. It would be ideal to have a couple organizations with atproto experience collaborate on the standards process together. If you are interested in being part of the atproto standards process, leave a message in the discussion thread for this post, or email protocol@blueskyweb.xyz.

While there has been a flowering of many projects built around the app.bsky microblogging application, there have been very few additional Lexicons and applications built from scratch. Some of this stemmed from restrictions on data schemas and proxying behavior on the Bluesky-hosted PDS instances, only relaxed just recently. We hope that new apps and Lexicons will exercise the full capabilities and corner-cases of the protocol.

We will continue to participate in adjacent standards efforts to make connections and get experience. Bluesky staff will attend IETF 120 in July, and are always happy to discuss responsible DNS integrations, OAuth, and HTTP API best practices.

Bluesky BGS and DID Document Formatting Changes

· 3 min read

We have a number of protocol and infrastructure changes rolling out in the next three months, and want to keep everybody in the loop.

This update was also emailed to the developer mailing list, which you can subscribe to here.

TL;DR

  • As of this week, the Bluesky AppView instance now consumes from a Bluesky BGS, instead of directly from the PDS. Devs can access the current streaming API at https://bsky.network/xrpc/com.atproto.sync.subscribeRepos or for WebSocket directly, wss://bsky.network/xrpc/com.atproto.sync.subscribeRepos
    Your existing cursor for bsky.social will not be in sync with bsky.network, so check the live stream first to grab a recent seq before connecting!
  • We are updating the DID document public key syntax to “Multikey” format next week on the main network PLC directory (plc.directory). This change is already live on the sandbox PLC directory.

How will this affect me?

  • For today, if you're consuming the firehose, grab a new cursor from bsky.network and restart your firehose consumer pointed at bsky.network.

Bluesky BGS

The Bluesky services themselves are moving to a federated deployment, with multiple Bluesky (the company) PDS instances aggregated by a BGS, and the AppView downstream of that. As of yesterday, the Bluesky Appview instance (api.bsky.app) consumes from a Bluesky PBC BGS (bsky.network), which consumes from the Bluesky PDS (bsky.social). Until now, the AppView consumed directly from the PDS.

How close are we to federation?

Technically, the main network BGS could start consuming from independent PDS instances today, the same as the sandbox BGS does. We have configured it not to do so until we finish implementing some more details, and do our own round of security hardening. If you want to bang on the BGS implementation (written in Go, code in the indigo github repository), please do so in the sandbox environment, not the main network.

This change impacts devs in two ways:

  • In the next couple weeks, new Bluesky (company) PDS instances will appear in the main network. Our plan is to optionally abstract this away for most client developers, so they can continue to connect to bsky.social as a virtual PDS. But the actual PDS hostnames will be distinct and will show up in DID documents.
  • Firehose consumers (feed generators, mirrors, metrics dashboards, etc) will need to switch over and consume from the BGS instead of the PDS directly. If they do not, they will miss content from the new (Bluesky) PDS instances.

The firehose subscription endpoint, which works as of today, is https://bsky.network/xrpc/com.atproto.sync.subscribeRepos (or wss:// for WebSocket directly). Note that this endpoint has different sequence numbers. When switching over, we recommend folks consume from both the BGS and PDS for a period to ensure no events are lost, or to scroll back the BGS cursor to ensure there is reasonable overlap in streams.

We encourage developers and operators to switch to the BGS firehose sooner than later.

DID Document Formatting Changes

We also want to remind folks that we are planning to update the DID document public key syntax to “Multikey” format next week on the main network PLC directory (plc.directory). These changes are described here, with example documents for testing, and are live now on the sandbox PLC directory.

Rate Limits, PDS Distribution v3, and More

· 5 min read

To get future blog posts directly in your email, you can now subscribe to Bluesky’s Developer Mailing List here.

Adding Rate Limits

Now that we have a better sense of user activity on the network, we’re adding some application rate limits. This helps us keep the network secure — for example, by limiting the number of requests a user or bot can make in a given time period, it prevents bad actors from brute-forcing certain requests and helps us limit spammy behavior.

We’re adding a rate limit for the number of created actions per DID. These numbers shouldn’t affect typical Bluesky users, and won’t affect the majority of developers either, but it will affect prolific bots, such as the ones that follow every user or like every post on the network. The limit is 5,000 points per hour and 35,000 points per day, where:

Action TypeValue
CREATE3 points
UPDATE2 points
DELETE1 point

To reiterate, these limits should be high enough to affect no human users, but low enough to constrain abusive or spammy bots. We decided to release this new rate limit immediately instead of giving developers an advance notice to secure the network from abusive behavior as soon as possible, especially since bad actors might take this blog post as an open invite!

Per this system, an account may create at most 1,666 records per hour and 11,666 records per day. That means an account can like up to 1,666 records in one hour with no problem. We took the most active human users on the network into account when we set this threshold (you surpassed our expectations!).

In case you missed it, in August, we added some other rate limits as well.

  • Global limit (aggregated across all routes)
    • Rate limited by IP
    • 3000/5 min
  • updateHandle
    • Rate limited by DID
    • 10/5 min
    • 50/day
  • createAccount
    • Rate limited by IP
    • 100/5 min
  • createSession
    • Rate limited by handle
    • 30/5 min
    • 300/day
  • deleteAccount
    • Rate limited by IP
    • 50/5 min
  • resetPassword
    • Rate limited by IP
    • 50/5 min

We’ll also return rate limit headers on each response so developers can dynamically adapt to these standards.

In a future update (in about a week), we’re also lowering the applyWrites limit from 200 to 10. This function applies a batch transaction of creates, updates, and deletes. This is part of the PDS distribution upgrade to v3 (read more below) — now that repos are ahistorical, we no longer need a higher limit to account for batch writes. applyWrites is used for transactional writes, and logic that requires more than 10 transactional records is rare.

PDS Distribution v3

We’re rolling out v3 of the PDS distribution. This shouldn’t be a breaking change, though we will be wiping the PLC sandbox. PDSs in parallel networks should still continue to operate with the new distribution.

Reminder: The PDS distribution auto-updates via the Watchtower companion Docker container, unless you specifically disabled that option. We’re adding the admin upgradeRepoVersion endpoint to the upgraded PDS distribution, so PDS admins can also upgrade their repos by hand.

Handle Invalidations on App View

Last month, we began proxying requests to the App View. In our federation architecture, the App View is the piece of the stack that gives you all your views of data, such as profiles and threads. Initially, we started out by serving all of these requests from our bsky.social PDS, but proxying these to the App View is one way of scaling our infrastructure to handle many more users on the network. (Read our federation architecture overview blog post for more information.)

For some users, this caused an invalid handle error. If you have an invalid handle, the user-facing UI will display this instead of your handle:

Screenshot of a profile with an invalid handle

You can use our debugging tool to investigate this: https://bsky-debug.app/handle. Just type your handle in. If it shows no error, please try updating your handle to the same handle you currently have to resolve this issue.

If the debugging page shows an error for your handle, follow this guide to make sure you set up your handle properly.

If that still isn’t working for you, file a support ticket through the app (“Help” button in the left menu on mobile or right side on desktop) and a Bluesky team member will assist you.

Subscribe for Developer Updates

You can subscribe to Bluesky’s Developer Mailing List here to receive future updates in your email. If you received your invite code from the developer waitlist, you’re already subscribed. Each email will have the option to unsubscribe.

We’ll continue to publish updates to our technical blog as well as on the app from @atproto.com.

Updates to Repository Sync Semantics

· 4 min read

We’re excited to announce that we’re rolling out a new version of atproto repositories that removes history from the canonical structure of repositories, and replaces it with a logical clock. We’ll start rolling out this update next week (August 28, 2023).

For most developers with projects subscribed to the firehose, such as feed generators, this change shouldn’t affect you. These will only affect you if you’re doing commit-aware repo sync (a good rule of thumb is if you’ve ever passed earliest or latest to the com.atproto.sync.getRepo method) or are explicitly checking the repo version when processing commits.

Removing Repository History

Repositories on the AT Protocol are like Git repositories, but for structured records. Just like Git, each commit to an atproto repository currently includes a pointer to the previous commit. However, this approach has caused a couple of pain points:

  • Record deletions are difficult to process. If a user deletes a record, that commit needs to be erased from their repository to match their intent.
  • Increased storage cost. Maintaining repo history can cause anywhere from a 5-10x increase in repo size.

We attempted to resolve both of these in the current model through rebases (discrete moments when the history of a repository is deleted/mutated, like in Git). However, this is a tricky and sensitive operation that is expensive to conduct and complex to communicate across the network.

Using a Logical Clock for Repositories

To address the above issues, we’re replacing the prev pointer in commits with a logical clock. We originally published our intention to do so a few weeks ago. These are the changes we’re making to the way we handle repository history:

  • Incrementing the repo version to 3
  • Making the prev field on repo commits optional
  • Adding a new required rev (revision) field which is a logical clock
  • Removing or adjusting commit-aware repo sync mechanisms

Note: If you explicitly verify the version of a repo commit or do strict type checking on commit repo commits (which you shouldn’t — the spec allows unspecified fields!), you will need to make that check inclusive of version 3.

To facilitate backwards compatibility with software that is still running repo v2, we will continue setting the prev field on commits in the interim.

Even though we are setting the prev field, this can be considered a “hint” and the history is no longer considered a canonical part of the repository.


Repository Revisions

The new sync semantics for the repository rely on a logical clock included in each signed commit.

This “revision” takes the form of a TID and must be monotonically increasing.

The included revision serves a few functions:

Ordering

The clock provides a simple ordering mechanism for encountered repos or commits. If a consumer encounters the same repo from two different sources, each with a valid signature and structure, the revision gives a simple mechanism to determine which is the most recent repository.

Sync

When syncing a repository, revisions give a series of signposts that allow you to request everything from a given repo since a previously seen version. Because revisions are ordered and monotonically increasing, the provider does not necessarily need the exact revision that the consumer is asking for (as with a commit hash), rather they can provide all repo contents from the latest version of the repo that they remember that is before the requested revision.

The PDS for instance will track the revision at which each repo block or record was introduced into a repository. If a consumer asks for every block or record since a given revision, the PDS has a simple mechanism by which to give that information, without needing a complicated sync algorithm.

Stale Reads

Finally, a logical clock on the repo gives us a mechanism through which we can detect stale reads. (We actually already snuck this in with an optional revision field on v2 repos!)

Repo revisions may be returned in response headers to most requests. A client will know their own repo’s current revision and can compare that with the upstream service’s revision.

We use this today on the PDS to paper over some read-after-write concerns that are inherent in eventually consistent architectures. Some clients may use these headers to alert their users that their PDS is “out of sync” with other services in the network (for instance an AppView).

Available sync methods


If you have questions about these changes, join us on GitHub Discussions here.