2022-09-26 18:08:19 -07:00
|
|
|
# frozen_string_literal: true
|
|
|
|
|
|
|
|
class Vacuum::StatusesVacuum
|
|
|
|
include Redisable
|
|
|
|
|
|
|
|
def initialize(retention_period)
|
|
|
|
@retention_period = retention_period
|
|
|
|
end
|
|
|
|
|
|
|
|
def perform
|
refactor(vacuum statuses): reduce amount of db queries and load for each query - improve performance (#21487)
* refactor(statuses_vacuum): remove dead code - unused
Method is not called inside class and private.
Clean up dead code.
* refactor(statuses_vacuum): make retention_period present test explicit
This private method only hides functionality.
It is best practice to be as explicit as possible.
* refactor(statuses_vacuum): improve query performance
- fix statuses_scope having sub-select for Account.remote scope by
`joins(:account).merge(Account.remote)`
- fix statuses_scope unnecessary use of `Status.arel_table[:id].lt`
because it is inexplicit, bad practice and even slower than normal
`.where('statuses.id < ?'`
- fix statuses_scope remove select(:id, :visibility) for having reusable
active record query batches (no re queries)
- fix vacuum_statuses! to use in_batches instead of find_in_batches,
because in_batches delivers a full blown active record query result,
in stead of an array - no requeries necessary
- send(:unlink_from_conversations) not to perform another db query, but
reuse the in_batches result instead.
- remove now obsolete remove_from_account_conversations method
- remove_from_search_index uses array of ids, instead of mapping
the ids from an array - this should be more efficient
- use the in_batches scope to call delete_all, instead of running
another db query for this - because it is again more efficient
- add TODO comment for calling models private method with send
* refactor(status): simplify unlink_from_conversations
- add `has_many through:` relation mentioned_accounts
- use model scope local instead of method call `Status#local?`
- more readable add account to inbox_owners when account.local?
* refactor(status): searchable_by way less sub selects
These queries all included a sub-select. Doing the same with a joins
should be more efficient.
Since this method does 5 such queries, this should be significant,
since it technically halves the query count.
This is how it was:
```ruby
[3] pry(main)> Status.first.mentions.where(account: Account.local, silent: false).explain
Status Load (1.6ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (1.5ms) SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
This is how it is with this change:
```ruby
[4] pry(main)> Status.first.mentions.joins(:account).merge(Account.local).active.explain
Status Load (1.7ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (0.7ms) SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
2022-11-27 11:41:18 -08:00
|
|
|
vacuum_statuses! if @retention_period.present?
|
2022-09-26 18:08:19 -07:00
|
|
|
end
|
|
|
|
|
|
|
|
private
|
|
|
|
|
|
|
|
def vacuum_statuses!
|
refactor(vacuum statuses): reduce amount of db queries and load for each query - improve performance (#21487)
* refactor(statuses_vacuum): remove dead code - unused
Method is not called inside class and private.
Clean up dead code.
* refactor(statuses_vacuum): make retention_period present test explicit
This private method only hides functionality.
It is best practice to be as explicit as possible.
* refactor(statuses_vacuum): improve query performance
- fix statuses_scope having sub-select for Account.remote scope by
`joins(:account).merge(Account.remote)`
- fix statuses_scope unnecessary use of `Status.arel_table[:id].lt`
because it is inexplicit, bad practice and even slower than normal
`.where('statuses.id < ?'`
- fix statuses_scope remove select(:id, :visibility) for having reusable
active record query batches (no re queries)
- fix vacuum_statuses! to use in_batches instead of find_in_batches,
because in_batches delivers a full blown active record query result,
in stead of an array - no requeries necessary
- send(:unlink_from_conversations) not to perform another db query, but
reuse the in_batches result instead.
- remove now obsolete remove_from_account_conversations method
- remove_from_search_index uses array of ids, instead of mapping
the ids from an array - this should be more efficient
- use the in_batches scope to call delete_all, instead of running
another db query for this - because it is again more efficient
- add TODO comment for calling models private method with send
* refactor(status): simplify unlink_from_conversations
- add `has_many through:` relation mentioned_accounts
- use model scope local instead of method call `Status#local?`
- more readable add account to inbox_owners when account.local?
* refactor(status): searchable_by way less sub selects
These queries all included a sub-select. Doing the same with a joins
should be more efficient.
Since this method does 5 such queries, this should be significant,
since it technically halves the query count.
This is how it was:
```ruby
[3] pry(main)> Status.first.mentions.where(account: Account.local, silent: false).explain
Status Load (1.6ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (1.5ms) SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
This is how it is with this change:
```ruby
[4] pry(main)> Status.first.mentions.joins(:account).merge(Account.local).active.explain
Status Load (1.7ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (0.7ms) SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
2022-11-27 11:41:18 -08:00
|
|
|
statuses_scope.in_batches do |statuses|
|
2022-09-26 18:08:19 -07:00
|
|
|
# Side-effects not covered by foreign keys, such
|
|
|
|
# as the search index, must be handled first.
|
refactor(vacuum statuses): reduce amount of db queries and load for each query - improve performance (#21487)
* refactor(statuses_vacuum): remove dead code - unused
Method is not called inside class and private.
Clean up dead code.
* refactor(statuses_vacuum): make retention_period present test explicit
This private method only hides functionality.
It is best practice to be as explicit as possible.
* refactor(statuses_vacuum): improve query performance
- fix statuses_scope having sub-select for Account.remote scope by
`joins(:account).merge(Account.remote)`
- fix statuses_scope unnecessary use of `Status.arel_table[:id].lt`
because it is inexplicit, bad practice and even slower than normal
`.where('statuses.id < ?'`
- fix statuses_scope remove select(:id, :visibility) for having reusable
active record query batches (no re queries)
- fix vacuum_statuses! to use in_batches instead of find_in_batches,
because in_batches delivers a full blown active record query result,
in stead of an array - no requeries necessary
- send(:unlink_from_conversations) not to perform another db query, but
reuse the in_batches result instead.
- remove now obsolete remove_from_account_conversations method
- remove_from_search_index uses array of ids, instead of mapping
the ids from an array - this should be more efficient
- use the in_batches scope to call delete_all, instead of running
another db query for this - because it is again more efficient
- add TODO comment for calling models private method with send
* refactor(status): simplify unlink_from_conversations
- add `has_many through:` relation mentioned_accounts
- use model scope local instead of method call `Status#local?`
- more readable add account to inbox_owners when account.local?
* refactor(status): searchable_by way less sub selects
These queries all included a sub-select. Doing the same with a joins
should be more efficient.
Since this method does 5 such queries, this should be significant,
since it technically halves the query count.
This is how it was:
```ruby
[3] pry(main)> Status.first.mentions.where(account: Account.local, silent: false).explain
Status Load (1.6ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (1.5ms) SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
This is how it is with this change:
```ruby
[4] pry(main)> Status.first.mentions.joins(:account).merge(Account.local).active.explain
Status Load (1.7ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (0.7ms) SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
2022-11-27 11:41:18 -08:00
|
|
|
statuses.direct_visibility
|
|
|
|
.includes(mentions: :account)
|
2023-01-11 12:57:24 -08:00
|
|
|
.find_each(&:unlink_from_conversations!)
|
refactor(vacuum statuses): reduce amount of db queries and load for each query - improve performance (#21487)
* refactor(statuses_vacuum): remove dead code - unused
Method is not called inside class and private.
Clean up dead code.
* refactor(statuses_vacuum): make retention_period present test explicit
This private method only hides functionality.
It is best practice to be as explicit as possible.
* refactor(statuses_vacuum): improve query performance
- fix statuses_scope having sub-select for Account.remote scope by
`joins(:account).merge(Account.remote)`
- fix statuses_scope unnecessary use of `Status.arel_table[:id].lt`
because it is inexplicit, bad practice and even slower than normal
`.where('statuses.id < ?'`
- fix statuses_scope remove select(:id, :visibility) for having reusable
active record query batches (no re queries)
- fix vacuum_statuses! to use in_batches instead of find_in_batches,
because in_batches delivers a full blown active record query result,
in stead of an array - no requeries necessary
- send(:unlink_from_conversations) not to perform another db query, but
reuse the in_batches result instead.
- remove now obsolete remove_from_account_conversations method
- remove_from_search_index uses array of ids, instead of mapping
the ids from an array - this should be more efficient
- use the in_batches scope to call delete_all, instead of running
another db query for this - because it is again more efficient
- add TODO comment for calling models private method with send
* refactor(status): simplify unlink_from_conversations
- add `has_many through:` relation mentioned_accounts
- use model scope local instead of method call `Status#local?`
- more readable add account to inbox_owners when account.local?
* refactor(status): searchable_by way less sub selects
These queries all included a sub-select. Doing the same with a joins
should be more efficient.
Since this method does 5 such queries, this should be significant,
since it technically halves the query count.
This is how it was:
```ruby
[3] pry(main)> Status.first.mentions.where(account: Account.local, silent: false).explain
Status Load (1.6ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (1.5ms) SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
This is how it is with this change:
```ruby
[4] pry(main)> Status.first.mentions.joins(:account).merge(Account.local).active.explain
Status Load (1.7ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (0.7ms) SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
2022-11-27 11:41:18 -08:00
|
|
|
remove_from_search_index(statuses.ids) if Chewy.enabled?
|
|
|
|
|
|
|
|
# Foreign keys take care of most associated records for us.
|
|
|
|
# Media attachments will be orphaned.
|
|
|
|
statuses.delete_all
|
2022-09-26 18:08:19 -07:00
|
|
|
end
|
|
|
|
end
|
|
|
|
|
|
|
|
def statuses_scope
|
refactor(vacuum statuses): reduce amount of db queries and load for each query - improve performance (#21487)
* refactor(statuses_vacuum): remove dead code - unused
Method is not called inside class and private.
Clean up dead code.
* refactor(statuses_vacuum): make retention_period present test explicit
This private method only hides functionality.
It is best practice to be as explicit as possible.
* refactor(statuses_vacuum): improve query performance
- fix statuses_scope having sub-select for Account.remote scope by
`joins(:account).merge(Account.remote)`
- fix statuses_scope unnecessary use of `Status.arel_table[:id].lt`
because it is inexplicit, bad practice and even slower than normal
`.where('statuses.id < ?'`
- fix statuses_scope remove select(:id, :visibility) for having reusable
active record query batches (no re queries)
- fix vacuum_statuses! to use in_batches instead of find_in_batches,
because in_batches delivers a full blown active record query result,
in stead of an array - no requeries necessary
- send(:unlink_from_conversations) not to perform another db query, but
reuse the in_batches result instead.
- remove now obsolete remove_from_account_conversations method
- remove_from_search_index uses array of ids, instead of mapping
the ids from an array - this should be more efficient
- use the in_batches scope to call delete_all, instead of running
another db query for this - because it is again more efficient
- add TODO comment for calling models private method with send
* refactor(status): simplify unlink_from_conversations
- add `has_many through:` relation mentioned_accounts
- use model scope local instead of method call `Status#local?`
- more readable add account to inbox_owners when account.local?
* refactor(status): searchable_by way less sub selects
These queries all included a sub-select. Doing the same with a joins
should be more efficient.
Since this method does 5 such queries, this should be significant,
since it technically halves the query count.
This is how it was:
```ruby
[3] pry(main)> Status.first.mentions.where(account: Account.local, silent: false).explain
Status Load (1.6ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (1.5ms) SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
This is how it is with this change:
```ruby
[4] pry(main)> Status.first.mentions.joins(:account).merge(Account.local).active.explain
Status Load (1.7ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (0.7ms) SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
2022-11-27 11:41:18 -08:00
|
|
|
Status.unscoped.kept
|
|
|
|
.joins(:account).merge(Account.remote)
|
|
|
|
.where('statuses.id < ?', retention_period_as_id)
|
2022-09-26 18:08:19 -07:00
|
|
|
end
|
|
|
|
|
|
|
|
def retention_period_as_id
|
|
|
|
Mastodon::Snowflake.id_at(@retention_period.ago, with_random: false)
|
|
|
|
end
|
|
|
|
|
refactor(vacuum statuses): reduce amount of db queries and load for each query - improve performance (#21487)
* refactor(statuses_vacuum): remove dead code - unused
Method is not called inside class and private.
Clean up dead code.
* refactor(statuses_vacuum): make retention_period present test explicit
This private method only hides functionality.
It is best practice to be as explicit as possible.
* refactor(statuses_vacuum): improve query performance
- fix statuses_scope having sub-select for Account.remote scope by
`joins(:account).merge(Account.remote)`
- fix statuses_scope unnecessary use of `Status.arel_table[:id].lt`
because it is inexplicit, bad practice and even slower than normal
`.where('statuses.id < ?'`
- fix statuses_scope remove select(:id, :visibility) for having reusable
active record query batches (no re queries)
- fix vacuum_statuses! to use in_batches instead of find_in_batches,
because in_batches delivers a full blown active record query result,
in stead of an array - no requeries necessary
- send(:unlink_from_conversations) not to perform another db query, but
reuse the in_batches result instead.
- remove now obsolete remove_from_account_conversations method
- remove_from_search_index uses array of ids, instead of mapping
the ids from an array - this should be more efficient
- use the in_batches scope to call delete_all, instead of running
another db query for this - because it is again more efficient
- add TODO comment for calling models private method with send
* refactor(status): simplify unlink_from_conversations
- add `has_many through:` relation mentioned_accounts
- use model scope local instead of method call `Status#local?`
- more readable add account to inbox_owners when account.local?
* refactor(status): searchable_by way less sub selects
These queries all included a sub-select. Doing the same with a joins
should be more efficient.
Since this method does 5 such queries, this should be significant,
since it technically halves the query count.
This is how it was:
```ruby
[3] pry(main)> Status.first.mentions.where(account: Account.local, silent: false).explain
Status Load (1.6ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (1.5ms) SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" WHERE "mentions"."status_id" = $1 AND "mentions"."account_id" IN (SELECT "accounts"."id" FROM "accounts" WHERE "accounts"."domain" IS NULL) AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
This is how it is with this change:
```ruby
[4] pry(main)> Status.first.mentions.joins(:account).merge(Account.local).active.explain
Status Load (1.7ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."deleted_at" IS NULL ORDER BY "statuses"."id" DESC LIMIT $1 [["LIMIT", 1]]
Mention Load (0.7ms) SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
=> EXPLAIN for: SELECT "mentions".* FROM "mentions" INNER JOIN "accounts" ON "accounts"."id" = "mentions"."account_id" WHERE "mentions"."status_id" = $1 AND "accounts"."domain" IS NULL AND "mentions"."silent" = $2 [["status_id", 109382923142288414], ["silent", false]]
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..23.08 rows=1 width=41)
-> Seq Scan on accounts (cost=0.00..10.90 rows=1 width=8)
Filter: (domain IS NULL)
-> Index Scan using index_mentions_on_account_id_and_status_id on mentions (cost=0.15..8.17 rows=1 width=41)
Index Cond: ((account_id = accounts.id) AND (status_id = '109382923142288414'::bigint))
Filter: (NOT silent)
(6 rows)
```
2022-11-27 11:41:18 -08:00
|
|
|
def remove_from_search_index(status_ids)
|
|
|
|
with_redis { |redis| redis.sadd('chewy:queue:StatusesIndex', status_ids) }
|
2022-09-26 18:08:19 -07:00
|
|
|
end
|
|
|
|
end
|