Bug 1346567 - Rewrite inefficient queries in cycle_non_job_data()

Since previously the select was evaluated independently of the delete,
causing timeouts when attempting to returning 2.6 million machine ids
from the jobs table.

Now the select queryset isn't evaluated on it's own, and instead is only
used to generate the subquery in eg:
  SELECT `machine`.`id`, `machine`.`name` FROM `machine` WHERE NOT
        (`machine`.`id` IN (SELECT U0.`machine_id` FROM `job` U0));
This commit is contained in:
Ed Morley 2017-03-12 22:21:36 +00:00
Родитель 73670ac626
Коммит b12defb0cb
1 изменённых файлов: 3 добавлений и 8 удалений

Просмотреть файл

@ -72,18 +72,13 @@ class Command(BaseCommand):
self.cycle_non_job_data(options['chunk_size'], options['sleep_time'])
def cycle_non_job_data(self, chunk_size, sleep_time):
(used_job_type_ids, used_machine_ids) = (set(), set())
used_job_type_ids = set(Job.objects.values_list(
'job_type_id', flat=True).distinct())
used_machine_ids = set(Job.objects.values_list(
'machine_id', flat=True).distinct())
used_job_type_ids = Job.objects.values('job_type_id').distinct()
JobType.objects.exclude(id__in=used_job_type_ids).delete()
used_job_group_ids = set(JobType.objects.values_list(
'job_group', flat=True).distinct())
used_job_group_ids = JobType.objects.values('job_group').distinct()
JobGroup.objects.exclude(id__in=used_job_group_ids).delete()
used_machine_ids = Job.objects.values('machine_id').distinct()
Machine.objects.exclude(id__in=used_machine_ids).delete()
def debug(self, msg):