In our application we run a background processor called Sidekiq which allows us to run hundreds of idempotent jobs against the database in parallel.
We have one job which can add up to 3,000 inserts into a very large table (60 attributes, 7 indexes, 1.5 billion records). Currently we have this set up to do direct loads.
CSV data is put into an array (not written to file) and fed to
STDIN like so:
ActiveRecord::Base.connection_pool.with_connection do |conn| rc = conn.raw_connection rc.copy_data "COPY apps FROM STDIN WITH CSV HEADER" do rows.each do |line| rc.put_copy_data line end end end
To generate the data for this direct load, we have to hit the table sequence once for each row. If 200 of these jobs are running simultaneously, and they can each have 3,000 records, that seems like a lot.
Is this strategy a bad idea, and all other things being ideal would we experience better throughput by doing individual inserts instead?