WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Rafael J. Wysocki	6e96c5b3ac	cpufreq: ondemand: Simplify conditionals in od_dbs_timer() Reduce the indentation level in the conditionals in od_dbs_timer() and drop the delay variable from it. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:05 +01:00
Rafael J. Wysocki	57dc3bcd45	cpufreq: governor: Move rate_mult to struct policy_dbs The rate_mult field in struct od_cpu_dbs_info_s is used by the code shared with the conservative governor and to access it that code has to do an ugly governor type check. However, first of all it is ever only used for policy->cpu, so it is per-policy rather than per-CPU and second, it is initialized to 1 by cpufreq_governor_start(), so if the conservative governor never modifies it, it will have no effect on the results of any computations. For these reasons, move rate_mult to struct policy_dbs_info (as a common field). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:04 +01:00
Rafael J. Wysocki	4cccf75557	cpufreq: governor: Get rid of the ->gov_check_cpu callback The way the ->gov_check_cpu governor callback is used by the ondemand and conservative governors is not really straightforward. Namely, the governor calls dbs_check_cpu() that updates the load information for the policy and the invokes ->gov_check_cpu() for the governor. To get rid of that entanglement, notice that cpufreq_governor_limits() doesn't need to call dbs_check_cpu() directly. Instead, it can simply reset the sample delay to 0 which will cause a sample to be taken immediately. The result of that is practically equivalent to calling dbs_check_cpu() except that it will trigger a full update of governor internal state and not just the ->gov_check_cpu() part. Following that observation, make cpufreq_governor_limits() reset the sample delay and turn dbs_check_cpu() into a function that will simply evaluate the load and return the result called dbs_update(). That function can now be called by governors from the routines that previously were pointed to by ->gov_check_cpu and those routines can be called directly by each governor instead of dbs_check_cpu(). This way ->gov_check_cpu becomes unnecessary, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:04 +01:00
Viresh Kumar	a23d6d1809	cpufreq: ondemand: Rearrange od_dbs_timer() to avoid updating delay Avoid extra checks in od_dbs_timer() by rearranging updates to the local delay variable in it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:02 +01:00
Viresh Kumar	aded387b94	cpufreq: conservative: Update sample_delay_ns immediately The ondemand governor already updates sample_delay_ns immediately on updates to the sampling rate, but conservative doesn't do that. It was left out earlier as the code was really too complex to get that done easily. Things are sorted out very well now, however, and the conservative governor can be modified to follow ondemand in that respect. Moreover, since the code needed to implement that in the conservative governor would be identical to the corresponding ondemand governor's code, make that code common and change both governors to use it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:01 +01:00
Viresh Kumar	c54df07184	cpufreq: governor: Create and traverse list of policy_dbs to avoid deadlock The dbs_data_mutex lock is currently used in two places. First, cpufreq_governor_dbs() uses it to guarantee mutual exclusion between invocations of governor operations from the core. Second, it is used by ondemand governor's update_sampling_rate() to ensure the stability of data structures walked by it. The second usage is quite problematic, because update_sampling_rate() is called from a governor sysfs attribute's ->store callback and that leads to a deadlock scenario involving cpufreq_governor_exit() which runs under dbs_data_mutex. Thus it is better to rework the code so update_sampling_rate() doesn't need to acquire dbs_data_mutex. To that end, rework update_sampling_rate() to walk a list of policy_dbs objects supported by the dbs_data one it has been called for (instead of walking cpu_dbs_info object for all CPUs). The list manipulation is protected with dbs_data->mutex which also is held around the execution of update_sampling_rate(), it is not necessary to hold dbs_data_mutex in that function any more. Reported-by: Juri Lelli <juri.lelli@arm.com> Reported-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:59 +01:00
Viresh Kumar	c443563036	cpufreq: governor: New sysfs show/store callbacks for governor tunables The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables (which of them is actually used depends on whether or not different policy objects can use the same governor with different tunables at the same time and, consequently, on where those attributes are located in sysfs). Unfortunately, in the freq-attr case, the standard cpufreq show/store sysfs attribute callbacks are applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That may lead to an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely). We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. Therefore policy->rwsem cannot be dropped in cpufreq_set_policy() at any point, but the deadlock situation described above must be avoided too. To that end, use the observation that in principle governor tunables may be represented by the same data type regardless of whether the governor is system-wide or per-policy and introduce a new structure, struct governor_attr, for representing them and new corresponding macros for creating show/store sysfs callbacks for them. Also make their parent kobject use a new kobject type whose default show/store callbacks are not related to the standard core cpufreq ones in any way (and they don't acquire policy->rwsem in particular). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Subject & changelog + rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:58 +01:00
Viresh Kumar	ff4b17895e	cpufreq: governor: Move common tunables to 'struct dbs_data' There are a few common tunables shared between the ondemand and conservative governors. Move them to struct dbs_data to simplify code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:58 +01:00
Viresh Kumar	d0684d3b89	cpufreq: governor: Create generic macro for common tunables Some tunables are present in governor-specific structures, whereas one (min_sampling_rate) is located directly in struct dbs_data. There is a special macro for creating its sysfs attribute and the show/store callbacks, but since more tunables are going to be moved to struct dbs_data, a new generic macro for such cases will be useful, so add it and use it for min_sampling_rate. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:57 +01:00
Rafael J. Wysocki	bc505475b8	cpufreq: governor: Rearrange governor data structures The struct policy_dbs_info objects representing per-policy governor data are not accessible directly from the corresponding policy objects. To access them, one has to get a pointer to the struct cpu_dbs_info of policy->cpu and use the policy_dbs field of that which isn't really straightforward. To address that rearrange the governor data structures so the governor_data pointer in struct cpufreq_policy will point to struct policy_dbs_info (instead of struct dbs_data) and that will contain a pointer to struct dbs_data. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:56 +01:00
Rafael J. Wysocki	d10b5eb5fc	cpufreq: governor: Drop cpu argument from dbs_check_cpu() Since policy->cpu is always passed as the second argument to dbs_check_cpu(), it is not really necessary to pass it, because the function can obtain that value via its first argument just fine. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:55 +01:00
Rafael J. Wysocki	e40e7b255e	cpufreq: governor: Rename cpu_common_dbs_info to policy_dbs_info The struct cpu_common_dbs_info structure represents the per-policy part of the governor data (for the ondemand and conservative governors), but its name doesn't reflect its purpose. Rename it to struct policy_dbs_info and rename variables related to it accordingly. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:55 +01:00
Rafael J. Wysocki	ea59ee0dc9	cpufreq: governor: Drop the gov pointer from struct dbs_data Since it is possible to obtain a pointer to struct dbs_governor from a pointer to the struct governor embedded in it with the help of container_of(), the additional gov pointer in struct dbs_data isn't really necessary. Drop that pointer and make the code using it reach the dbs_governor object via policy->governor. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:55 +01:00
Rafael J. Wysocki	906a6e5aae	cpufreq: governor: Rework cpufreq_governor_dbs() Since it is possible to obtain a pointer to struct dbs_governor from a pointer to the struct governor embedded in it via container_of(), the second argument of cpufreq_governor_init() is not necessary. Accordingly, cpufreq_governor_dbs() doesn't need its second argument either and the ->governor callbacks for both the ondemand and conservative governors may be set to cpufreq_governor_dbs() directly. Make that happen. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:54 +01:00
Rafael J. Wysocki	7bdad34d08	cpufreq: governor: Rename some data types and variables The ondemand and conservative governors are represented by struct common_dbs_data whose name doesn't reflect the purpose it is used for, so rename it to struct dbs_governor and rename variables of that type accordingly. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:54 +01:00
Rafael J. Wysocki	af92618523	cpufreq: governor: Put governor structure into common_dbs_data For the ondemand and conservative governors (generally, governors that use the common code in cpufreq_governor.c), there are two static data structures representing the governor, the struct governor structure (the interface to the cpufreq core) and the struct common_dbs_data one (the interface to the cpufreq_governor.c code). There's no fundamental reason why those two structures have to be separate. Moreover, if the struct governor one is included into struct common_dbs_data, it will be possible to reach the latter from the policy via its policy->governor pointer, so it won't be necessary to pass a separate pointer to it around. For this reason, embed struct governor in struct common_dbs_data. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:54 +01:00
Rafael J. Wysocki	2bb8d94fb0	cpufreq: governor: Use common mutex for dbs_data protection Every governor relying on the common code in cpufreq_governor.c has to provide its own mutex in struct common_dbs_data. However, there actually is no need to have a separate mutex per governor for this purpose, they may be using the same global mutex just fine. Accordingly, introduce a single common mutex for that and drop the mutex field from struct common_dbs_data. That at least will ensure that the mutex is always present and initialized regardless of what the particular governors do. Another benefit is that the common code does not need a pointer to a governor-related structure to get to the mutex which sometimes helps. Finally, it makes the code generally easier to follow. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:53 +01:00
Rafael J. Wysocki	9be4fd2c77	cpufreq: governor: Replace timers with utilization update callbacks Instead of using a per-CPU deferrable timer for queuing up governor work items, register a utilization update callback that will be invoked from the scheduler on utilization changes. The sampling rate is still the same as what was used for the deferrable timers and the added irq_work overhead should be offset by the eliminated timers overhead, so in theory the functional impact of this patch should not be significant. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>	2016-03-09 14:40:53 +01:00
Rafael J. Wysocki	de1df26b7c	cpufreq: Clean up default and fallback governor setup The preprocessor magic used for setting the default cpufreq governor (and for using the performance governor as a fallback one for that matter) is really nasty, so replace it with __weak functions and overrides. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-02-05 02:37:42 +01:00
Viresh Kumar	f08f638b9c	cpufreq: ondemand: update update_sampling_rate() to make it more efficient Currently update_sampling_rate() runs over each online CPU and cancels/queues timers on all policy->cpus every time. This should be done just once for any cpu belonging to a policy. Create a cpumask and keep on clearing it as and when we process policies, so that we don't have to traverse through all CPUs of the same policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-12-09 22:26:12 +01:00
Viresh Kumar	70f43e5e79	cpufreq: governor: replace per-CPU delayed work with timers cpufreq governors evaluate load at sampling rate and based on that they update frequency for a group of CPUs belonging to the same cpufreq policy. This is required to be done in a single thread for all policy->cpus, but because we don't want to wakeup idle CPUs to do just that, we use deferrable work for this. If we would have used a single delayed deferrable work for the entire policy, there were chances that the CPU required to run the handler can be in idle and we might end up not changing the frequency for the entire group with load variations. And so we were forced to keep per-cpu works, and only the one that expires first need to do the real work and others are rescheduled for next sampling time. We have been using the more complex solution until now, where we used a delayed deferrable work for this, which is a combination of a timer and a work. This could be made lightweight by keeping per-cpu deferred timers with a single work item, which is scheduled by the first timer that expires. This patch does just that and here are important changes: - The timer handler will run in irq context and so we need to use a spin_lock instead of the timer_mutex. And so a separate timer_lock is created. This also makes the use of the mutex and lock quite clear, as we know what exactly they are protecting. - A new field 'skip_work' is added to track when the timer handlers can queue a work. More comments present in code. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-12-09 22:26:00 +01:00
Viresh Kumar	affde5d06a	cpufreq: governor: Pass policy as argument to ->gov_dbs_timer() Pass 'policy' as argument to ->gov_dbs_timer() instead of cdbs and dbs_data. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-12-07 02:20:22 +01:00
Viresh Kumar	e68fe18c5b	cpufreq: ondemand: Work is guaranteed to be pending We are guaranteed to have works scheduled for policy->cpus, as the policy isn't stopped yet. And so there is no need to check that again. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-12-07 02:20:21 +01:00
Viresh Kumar	e128c86407	cpufreq: ondemand: Update sampling rate only for concerned policies We are comparing policy->governor against cpufreq_gov_ondemand to make sure that we update sampling rate only for the concerned CPUs. But that isn't enough. In case of governor_per_policy, there can be multiple instances of ondemand governor and we will always end up updating all of them with current code. What we rather need to do, is to compare dbs_data with poilcy->governor_data, which will match only for the policies governed by dbs_data. This code is also racy as the governor might be getting stopped at that time and we may end up scheduling work for a policy, which we have just disabled. Fix that by protecting the entire function with &od_dbs_cdata.mutex, which will prevent against races with policy START/STOP/etc. After these locks are in place, we can safely get the policy via per-cpu dbs_info. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-12-07 02:20:10 +01:00
Viresh Kumar	083701b13c	cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate() 'timer_mutex' is required to sync work-handlers of policy->cpus. update_sampling_rate() is just canceling the works and queuing them again. This isn't protecting anything at all in update_sampling_rate() and is not gonna be of any use. Even if a work-handler is already running for a CPU, cancel_delayed_work_sync() will wait for it to finish. Drop these unnecessary locks. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-10-28 09:20:04 +01:00
Viresh Kumar	43e0ee361e	cpufreq: governor: split out common part of {cs\|od}_dbs_timer() Some part of cs_dbs_timer() and od_dbs_timer() is exactly same and is unnecessarily duplicated. Create the real work-handler in cpufreq_governor.c and put the common code in this routine (dbs_timer()). Shouldn't make any functional change. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-07-21 01:12:01 +02:00
Viresh Kumar	44152cb82d	cpufreq: governor: Keep single copy of information common to policy->cpus Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone. The memory for cpu_common_dbs_info is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-07-21 01:12:01 +02:00
Viresh Kumar	42994af63c	cpufreq: governor: rename cur_policy as policy Just call it 'policy', cur_policy is unnecessarily long and doesn't have any special meaning. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-07-17 23:46:48 +02:00
Viresh Kumar	386d46e6d5	cpufreq: governor: Name delayed-work as dwork Delayed work was named as 'work' and to access work within it we do work.work. Not much readable. Rename delayed_work as 'dwork'. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-07-17 23:46:47 +02:00
Viresh Kumar	732b6d617a	cpufreq: governor: Serialize governor callbacks There are several races reported in cpufreq core around governors (only ondemand and conservative) by different people. There are at least two race scenarios present in governor code: (a) Concurrent access/updates of governor internal structures. It is possible that fields such as 'dbs_data->usage_count', etc. are accessed simultaneously for different policies using same governor structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And because of this we can dereference bad pointers. For example consider a system with two CPUs with separate 'struct cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave. CPU0 switching to powersave and CPU1 to ondemand: CPU0 CPU1 store* store* cpufreq_governor_exit() cpufreq_governor_init() dbs_data = cdata->gdbs_data; if (!--dbs_data->usage_count) kfree(dbs_data); dbs_data->usage_count++; Bad pointer dereference There are other races possible between EXIT and START/STOP/LIMIT as well. Its really complicated. (b) Switching governor state in bad sequence: For example trying to switch a governor to START state, when the governor is in EXIT state. There are some checks present in __cpufreq_governor() but they aren't sufficient as they compare events against 'policy->governor_enabled', where as we need to take governor's state into account, which can be used by multiple policies. These two issues need to be solved separately and the responsibility should be properly divided between cpufreq and governor core. The first problem is more about the governor core, as it needs to protect its structures properly. And the second problem should be fixed in cpufreq core instead of governor, as its all about sequence of events. This patch is trying to solve only the first problem. There are two types of data we need to protect, - 'struct common_dbs_data': No matter what, there is going to be a single copy of this per governor. - 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we will have per-policy copy of this data, otherwise a single copy. Because of such complexities, the mutex present in 'struct dbs_data' is insufficient to solve our problem. For example we need to protect fetching of 'dbs_data' from different structures at the beginning of cpufreq_governor_dbs(), to make sure it isn't currently being updated. This can be fixed if we can guarantee serialization of event parsing code for an individual governor. This is best solved with a mutex per governor, and the placeholder for that is 'struct common_dbs_data'. And so this patch moves the mutex from 'struct dbs_data' to 'struct common_dbs_data' and takes it at the beginning and drops it at the end of cpufreq_governor_dbs(). Tested with and without following configuration options: CONFIG_LOCKDEP_SUPPORT=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_PI_LIST=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_DEBUG_ATOMIC_SLEEP=y Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-06-15 15:42:53 +02:00
Viresh Kumar	8e0484d2b3	cpufreq: governor: register notifier from cs_init() Notifiers are required only for conservative governor and the common governor code is unnecessarily polluted with that. Handle that from cs_init/exit() instead of cpufreq_governor_dbs(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-06-15 15:37:12 +02:00
Stratos Karafotis	6393d6a102	cpufreq: ondemand: Eliminate the deadband effect Currently, ondemand calculates the target frequency proportional to load using the formula: Target frequency = C * load where C = policy->cpuinfo.max_freq / 100 Though, in many cases, the minimum available frequency is pretty high and the above calculation introduces a dead band from load 0 to 100 * policy->cpuinfo.min_freq / policy->cpuinfo.max_freq where the target frequency is always calculated to less than policy->cpuinfo.min_freq and the minimum frequency is selected. For example: on Intel i7-3770 @ 3.4GHz the policy->cpuinfo.min_freq = 1600000 and the policy->cpuinfo.max_freq = 3400000 (without turbo). Thus, the CPU starts to scale up at a load above 47. On quad core 1500MHz Krait the policy->cpuinfo.min_freq = 384000 and the policy->cpuinfo.max_freq = 1512000. Thus, the CPU starts to scale at load above 25. Change the calculation of target frequency to eliminate the above effect using the formula: Target frequency = A + B * load where A = policy->cpuinfo.min_freq and B = (policy->cpuinfo.max_freq - policy->cpuinfo->min_freq) / 100 This will map load values 0 to 100 linearly to cpuinfo.min_freq to cpuinfo.max_freq. Also, use the CPUFREQ_RELATION_C in __cpufreq_driver_target to select the closest frequency in frequency_table. This is necessary to avoid selection of minimum frequency only when load equals to 0. It will also help for selection of frequencies using a more 'fair' criterion. Tables below show the difference in selected frequency for specific values of load without and with this patch. On Intel i7-3770 @ 3.40GHz: Without With Load Target Selected Target Selected 0 0 1600000 1600000 1600000 5 170050 1600000 1690050 1700000 10 340100 1600000 1780100 1700000 15 510150 1600000 1870150 1900000 20 680200 1600000 1960200 2000000 25 850250 1600000 2050250 2100000 30 1020300 1600000 2140300 2100000 35 1190350 1600000 2230350 2200000 40 1360400 1600000 2320400 2400000 45 1530450 1600000 2410450 2400000 50 1700500 1900000 2500500 2500000 55 1870550 1900000 2590550 2600000 60 2040600 2100000 2680600 2600000 65 2210650 2400000 2770650 2800000 70 2380700 2400000 2860700 2800000 75 2550750 2600000 2950750 3000000 80 2720800 2800000 3040800 3000000 85 2890850 `2900000` 3130850 3100000 90 3060900 3100000 3220900 3300000 95 3230950 3300000 3310950 3300000 100 3401000 3401000 3401000 3401000 On ARM quad core 1500MHz Krait: Without With Load Target Selected Target Selected 0 0 384000 384000 384000 5 75600 384000 440400 486000 10 151200 384000 496800 486000 15 226800 384000 553200 594000 20 302400 384000 609600 594000 25 378000 384000 666000 702000 30 453600 486000 722400 702000 35 529200 594000 778800 810000 40 604800 702000 835200 810000 45 680400 702000 891600 918000 50 756000 810000 948000 918000 55 831600 918000 1004400 1026000 60 907200 918000 1060800 1026000 65 982800 1026000 1117200 1134000 70 1058400 1134000 1173600 1134000 75 1134000 1134000 1230000 1242000 80 1209600 1242000 1286400 1242000 85 1285200 `1350000` 1342800 `1350000` 90 1360800 1458000 1399200 `1350000` 95 1436400 1458000 1455600 1458000 100 1512000 1512000 1512000 1512000 Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait (Android smartphone). Benchmarks on Intel i7 shows a performance improvement on low and medium work loads with lower power consumption. Specifics: Phoronix Linux Kernel Compilation 3.1: Time: -0.40%, energy: -0.07% Phoronix Apache: Time: -4.98%, energy: -2.35% Phoronix FFMPEG: Time: -6.29%, energy: -4.02% Also, running mp3 decoding (very low load) shows no differences with and without this patch. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-07-21 13:43:19 +02:00
Stratos Karafotis	880eef0416	cpufreq: ondemand: Remove redundant return statement After commit `dfa5bb6225` (cpufreq: ondemand: Change the calculation of target frequency), this return statement is no longer needed. Reported-by: Henrik Nilsson <Karl.Henrik.Nilsson@gmail.com> Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-11-01 00:44:34 +01:00
Stratos Karafotis	934dac1ea0	cpufreq: governors: Remove duplicate check of target freq in supported range Function __cpufreq_driver_target() checks if target_freq is within policy->min and policy->max range. generic_powersave_bias_target() also checks if target_freq is valid via a cpufreq_frequency_table_target() call. So, drop the unnecessary duplicate check in *_check_cpu(). Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-28 22:03:02 +02:00
Rafael J. Wysocki	c49a089c3e	Merge back earlier 'pm-cpufreq' material	2013-08-14 22:21:16 +02:00
Viresh Kumar	d5b73cd870	cpufreq: Use sizeof(ptr) convetion for computing sizes Chapter 14 of Documentation/CodingStyle says: The preferred form for passing a size of a struct is the following: p = kmalloc(sizeof(p), ...); The alternative form where struct name is spelled out hurts readability and introduces an opportunity for a bug when the pointer variable type is changed but the corresponding sizeof that is passed to a memory allocator is not. This wasn't followed consistently in drivers/cpufreq, let's make it more consistent by always following this rule. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:34:10 +02:00
Viresh Kumar	3a3e9e06d0	cpufreq: Give consistent names to cpufreq_policy objects They are called policy, cur_policy, new_policy, data, etc. Just call them policy wherever possible. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:34:10 +02:00
Viresh Kumar	5ff0a26803	cpufreq: Clean up header files included in the core This patch addresses the following issues in the header files in the cpufreq core: - Include headers in ascending order, so that we don't add same many times by mistake. - <asm/> must be included after <linux/>, so that they override whatever they need to. - Remove unnecessary includes. - Don't include files already included by cpufreq.h or cpufreq_governor.h. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:34:09 +02:00
Viresh Kumar	6c4640c3ad	cpufreq: rename ignore_nice as ignore_nice_load This sysfs file was called ignore_nice_load earlier and commit `4d5dcc4` (cpufreq: governor: Implement per policy instances of governors) changed its name to ignore_nice by mistake. Lets get it renamed back to its original name. Reported-by: Martin von Gagern <Martin.vGagern@gmx.net> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 22:25:06 +02:00
Stratos Karafotis	dfa5bb6225	cpufreq: ondemand: Change the calculation of target frequency The ondemand governor calculates load in terms of frequency and increases it only if load_freq is greater than up_threshold multiplied by the current or average frequency. This appears to produce oscillations of frequency between min and max because, for example, a relatively small load can easily saturate minimum frequency and lead the CPU to the max. Then, it will decrease back to the min due to small load_freq. Change the calculation method of load and target frequency on the basis of the following two observations: - Load computation should not depend on the current or average measured frequency. For example, absolute load of 80% at 100MHz is not necessarily equivalent to 8% at 1000MHz in the next sampling interval. - It should be possible to increase the target frequency to any value present in the frequency table proportional to the absolute load, rather than to the max only, so that: Target frequency = C * load where we take C = policy->cpuinfo.max_freq / 100. Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait. Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an increase ~1.5% in performance. cpufreq_stats (time_in_state) shows that middle frequencies are used more, with this patch. Highest and lowest frequencies were used less by ~9%. [rjw: We have run multiple other tests on kernels with this change applied and in the vast majority of cases it turns out that the resulting performance improvement also leads to reduced consumption of energy. The change is additionally justified by the overall simplification of the code in question.] Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-07-26 01:06:43 +02:00
Jacob Shin	c28375583b	cpufreq: fix NULL pointer deference at od_set_powersave_bias() When initializing the default powersave_bias value, we need to first make sure that this policy is running the ondemand governor. Reported-and-tested-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-06-25 22:42:37 +02:00
Borislav Petkov	60e6726c7b	cpufreq, ondemand: Remove leftover debug line I don't see how the virtual address of the tuners pointer would be of any help to anyone so remove it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-05-13 14:02:31 +02:00
Jacob Shin	fb30809efa	cpufreq: ondemand: allow custom powersave_bias_target handler to be registered This allows for another [arch specific] driver to hook into existing powersave bias function of the ondemand governor. i.e. This allows AMD specific powersave bias function (in a separate AMD specific driver) to aid ondemand governor's frequency transition decisions. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Thomas Renninger <trenn@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-10 13:19:26 +02:00
Stratos Karafotis	9366d84052	cpufreq: governors: Calculate iowait time only when necessary Currently we always calculate the CPU iowait time and add it to idle time. If we are in ondemand and we use io_is_busy, we re-calculate iowait time and we subtract it from idle time. With this patch iowait time is calculated only when necessary avoiding the double call to get_cpu_iowait_time_us. We use a parameter in function get_cpu_idle_time to distinguish when the iowait time will be added to idle time or not, without the need of keeping the prev_io_wait. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.,org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:11:35 +02:00
Viresh Kumar	031299b3be	cpufreq: governors: Avoid unnecessary per cpu timer interrupts Following patch has introduced per cpu timers or works for ondemand and conservative governors. commit `2abfa876f1` Author: Rickard Andersson <rickard.andersson@stericsson.com> Date: Thu Dec 27 14:55:38 2012 +0000 cpufreq: handle SW coordinated CPUs This causes additional unnecessary interrupts on all cpus when the load is recently evaluated by any other cpu. i.e. When load is recently evaluated by cpu x, we don't really need any other cpu to evaluate this load again for the next sampling_rate time. Some sort of code is present to avoid that but we are still getting timer interrupts for all cpus. A good way of avoiding this would be to modify delays for all cpus (policy->cpus) whenever any cpu has evaluated load. This patch does this change and some related code cleanup. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:11:35 +02:00
Viresh Kumar	9d44592018	cpufreq: ondemand: Don't update sample_type if we don't evaluate load again Because we have per cpu timer now, we check if we need to evaluate load again or not (In case it is recently evaluated). Here the 2nd cpu which got timer interrupt updates core_dbs_info->sample_type irrespective of load evaluation is required or not. Which is wrong as the first cpu is dependent on this variable set to an older value. Moreover it would be best in this case to schedule 2nd cpu's timer to sampling_rate instead of freq_lo or hi as that must be managed by the other cpu. In case the other cpu idles in between then also we wouldn't loose much power. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:11:34 +02:00
Viresh Kumar	4d5dcc4211	cpufreq: governor: Implement per policy instances of governors Currently, there can't be multiple instances of single governor_type. If we have a multi-package system, where we have multiple instances of struct policy (per package), we can't have multiple instances of same governor. i.e. We can't have multiple instances of ondemand governor for multiple packages. Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/ governor-name/. Which again reflects that there can be only one instance of a governor_type in the system. This is a bottleneck for multicluster system, where we want different packages to use same governor type, but with different tunables. This patch uses the infrastructure provided by earlier patch and implements init/exit routines for ondemand and conservative governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:11:34 +02:00
Stratos Karafotis	06eb09d17c	cpufreq: ondemand: Fix typos in comments Fix some typos in comments. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-09 12:56:19 +01:00
Stratos Karafotis	4bd4e42819	cpufreq: ondemand: Replace down_differential tuner with adj_up_threshold In order to avoid the calculation of up_threshold - down_differential every time that the frequency must be decreased, we replace the down_differential tuner with the adj_up_threshold which keeps the difference across multiple checks. Update the adj_up_threshold only when the up_theshold is also updated. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-09 01:18:47 +01:00
Viresh Kumar	4447266b84	cpufreq: governors: Remove code redundancy between governors With the inclusion of following patches: 9f4eb10 cpufreq: conservative: call dbs_check_cpu only when necessary 772b4b1 cpufreq: ondemand: call dbs_check_cpu only when necessary code redundancy between the conservative and ondemand governors is introduced again, so get rid of it. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Fabio Baltieri <fabio.baltieri@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-02 01:02:44 +01:00

1 2 3 4

153 Коммитов