Change the type of the documentation fields from string to safehtml.HTML.
This forces the conversions between strings and safehtml.HTML values
to happen at the lowest level, reading and writing to the database.
That increases our confidence that nothing will modify the HTML after
it is read from the DB and before it is used in a template.
Change-Id: If5d65123ce2b69ef3183221048aa6d081b003762
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/240510
Reviewed-by: Julie Qiu <julie@golang.org>
Previously, NewSet and NewContext required the caller to construct a set
that was passed in. The functions now accept experiment names, to make
them easier to use.
Change-Id: Id306902910d51483c48a6a3bc8205cf484694d6a
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/240857
Reviewed-by: Jonathan Amsterdam <jba@google.com>
postgres.DeleteModule is updated to delete the corresponding module_path
and resolved_version row from the version_map table.
Fixes#39633
Change-Id: I9de46b08e535bea52c2ea4dbab1c71a8e1c2c2f7
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/238637
Reviewed-by: Julie Qiu <julie@golang.org>
InMemory now expects the given function to close over things like the
proxy and source clients.
Change-Id: I7b3a2793a824ca29453b19b47b96bdedb2a91010
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/238441
Reviewed-by: Julie Qiu <julie@golang.org>
Add a /poll endpoint that polls the index, but just writes
to module_version_states without enqueuing.
Add an /enqueue endpoint that is identical to /requeue. The name more
accurately reflects that it is for new modules as well as
reprocessing.
Update the worker status page to use the new endpoints.
We'll delete the old endpoints after deploying and changing the
scheduler jobs.
Updates b/158866584.
Change-Id: Id116bf9fd99fa55aaacd71bb4ca6b60770ca8812
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/239480
Reviewed-by: Julie Qiu <julie@golang.org>
Information about experiments and excluded prefixes are added to the
worker homepage.
Change-Id: I7bb7fd1eece434bd4da12e1af384b141c8a0ed41
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/239181
Reviewed-by: Jonathan Amsterdam <jba@google.com>
A /delete endpoint is added to the worker, which is used to delete a
specified module version.
Change-Id: Id50bd9785e9d08c7d399f0fe8f757131767a0819
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/239742
Reviewed-by: Jonathan Amsterdam <jba@google.com>
worker.Server had some special handling for status codes that didn't
align with our error semantics. Refactor so that this is unnecessary.
This is the only usage of FromHTTPStatus in non-test code.
Change-Id: I15a93c4c42d724c43cf70d8abfe19f4092179a1d
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/238457
Reviewed-by: Jonathan Amsterdam <jba@google.com>
The initial approach used sync.Once to synchronize
the assignment of errString. This was still racey, however.
Instead, annotate the error to keep the values together, removing
the need for any sync.
Fixes#39611
Change-Id: Ifb889855a5bd583b36a876dff5e44f85812733d3
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/238041
Reviewed-by: Julie Qiu <julie@golang.org>
Remove the `cfg` global and the functions that accessed it.
Fixes b/145301722.
Change-Id: I58ab9fbd4fc29f66dbc5b120f04c88ee0703ee57
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/238437
Reviewed-by: Julie Qiu <julie@golang.org>
This is the first CL to clean up various TODOs that are outdated, and
replace internal issue links with GitHub issue links.
Updates golang/go#39621
Change-Id: If270e8b2e8198c007cb4aa71ad8486182f4f3380
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/238319
Reviewed-by: Jonathan Amsterdam <jba@google.com>
It was hard to read the args to NewServer, and now that we have two
redis clients, it was error-prone to provide them (order matters).
Using a struct effectively lets the caller name the args.
Change-Id: I0e2e39e09402031fd21a754961a2685c377c75fc
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/768540
Reviewed-by: Julie Qiu <julieqiu@google.com>
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Fix staticcheck error:
internal/worker/server.go:491:10: error strings should not be capitalized (ST1005)
Change-Id: I7307e897fc7dc6e77075aca96f1fd15a5a338a33
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/768314
Reviewed-by: Jonathan Amsterdam <jba@google.com>
We sometimes want to clear our Redis-based cache. Adding an endpoint
to the worker will let us do this more simply and with less chance of
error than running the redis CLI.
Change-Id: I855ea5d906f0cb080b4e2f6d5fa279a6a2e0b949
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/768539
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Julie Qiu <julieqiu@google.com>
ModuleInfo is added, which represents the module info for the new data
model. ModuleInfo is embedded in LegacyModuleInfo.
In order to support the existing overview tab functionality,
GetDirectoryNew was changed to return the module README, regardless of
whether there is a README for the directory.
We will change this logic to display the README for the directory in a
future CL.
Change-Id: I624a6d99b711870826fd7dff9100d4ad47852db2
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/766801
Reviewed-by: Jonathan Amsterdam <jba@google.com>
ModuleInfo is renamed to LegacyModuleInfo, as a step towards
deprecating LegacyReadmeFilePath and LegacyReadmeContents.
In a follow up CL, we will add ModuleInfo as an embedded struct
to LegacyModuleInfo.
Change-Id: Ie452420448eec1d13edaf62e548df0b9e2cbbe4b
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/766479
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
The following structs have been renamed with a Legacy prefix:
* internal.Directory
* internal.Package
* internal.VersionedPackage
* sample.Package
The following fields on internal.ModuleInfo have also been changed:
* ReadmeFilePath
* ReadmeFileContents
This is done to help us distinguish between legacy and method
structs/methods while migrating code to the new data model.
Change-Id: Ibedf71d4db6323ef5aa05d73a0240537ea6073d3
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/765160
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
- Make it possible to add a label to a context, so it appears in all log messages.
- Add the module and version as a label when fetching.
This will make it easy to find all log messages related to the fetch
of a single module.
Change-Id: I6c564b6aa58a494180d61168e739ef62dd67a64c
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/760101
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Julie Qiu <julieqiu@google.com>
Rather than reading experiments from the context, the experiment set for
an InMemoryQueue is now pass in as an argument. This makes it more
explicit what experiments are being set.
Change-Id: Ib68f567ea5b7ff0fc2157ad2713c76d33827c442
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/759927
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
At the moment, taskIDChangeInterval is a hardcoded value in
internal/queue. However, we will soon have two task queues running,
which require different change intervals, so this value is now set in
internal/config.
Additionally, the taskIDChangeInterval for the worker is changed to 3
hours.
Change-Id: I498abefce6543005463be7da99a5a778f3a6e973
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/758919
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
The contents of this directory are not being used.
Change-Id: I0660f09ba8bd481afa806ad67b0c889d8f055ec7
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/756689
Reviewed-by: Jonathan Amsterdam <jba@google.com>
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Now that we are using basic scaling, we can use a longer timeout for
all requests, including fetches. So the special fetch timeout, and the
detached context and tracing that went with it, are no longer necessary.
We still enforce a timeout on requests via middleware.Timeout, using a value
that comes from GO_DISCOVERY_WORKER_TIMEOUT_MINUTES.
Updates b/141406364.
Change-Id: I39266a3f08b1f7e2fea3fdbec05bf092ea974d9b
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/750119
Reviewed-by: Julie Qiu <julieqiu@google.com>
The logic for GetVersionMap is changed so that an explicit module path
is now required. All logic is also now the same for all
requestedVersions (as opposed to a special case for latest).
This function is only used by the frontend to check if a version exists
as part of a frontend fetch, and this behavior makes more sense for that
use case.
Updates golang/go#36811
Updates golang/go#37002
Updates golang/go#37106
Change-Id: I415a2730daa6edc023f80c0c615521047311f35b
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/744833
Reviewed-by: Jonathan Amsterdam <jba@google.com>
FetchModule now only returns FetchResult, which has been refactored to
contain any errors that occurred.
This allows us to skip populating various fields in fetchTasks.
Change-Id: I99bf5ce1f10461f42da9f62c77f0e821af926323
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/743101
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Had the wrong order of args to time.Parse.
Change-Id: Ice2042d864af1ef899a2c7a75e943c1ceb763b22
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/739501
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Julie Qiu <julieqiu@google.com>
Replace many calls to sample.DefaultModule, and all calls to
sample.DefaultPackage, with calls to the new Module and Package
functions.
Change-Id: I76921e14502585f8ca9a4dba5de01d7055522f3f
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/733604
Reviewed-by: Julie Qiu <julieqiu@google.com>
This is the first step towards making sample clearer and less
error-prone.
- Rename Module and Package to DefaultModule and DefaultPackage, so
existing code continues to work. In later CLs, I'll replace most or
all calls to these functions.
- Define ModuleInfo, Module and Package to take as arguments the most
common fields. The Module function does _not_ create a package; it
returns a module with no packages.
- Most important: define AddPackage, which adds a Package to a Module
and correctly updates Module.Directories.
- Remove VersionedPackage, since it's clearer to create one directly.
- Update a couple of call sites, just to get tests to pass. More
updates in a later CL.
Change-Id: I46eb94ba897d4f122483b58435107b8782c6044f
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/733619
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Julie Qiu <julieqiu@google.com>
When we delete the last version of a module, remove it
from imports_unique.
Updates b/154616892.
Change-Id: I988c439ae117cca4b79ea33752f730fb48c2496f
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/728094
Reviewed-by: Julie Qiu <julieqiu@google.com>
At the moment, we reprocess and requeue modules using the following
logic:
1. Set all modules to be reprocessed = 505.
2. Requeue modules with status=0 or status >= 500. Prioritize the
following:
- IsLatest: sorted by release vs prerelease modules
- IsBig: hardcoded list of modules we know are big
This poses the following problems:
1. Requeue order is not idempotent: priority is given to categories of
modules, but within each category, the order of modules being queued can
change each time requeue is called. This leads to many modules sitting
in the task queue, and a lack of clarity as to how much progress we have
made when looking at the logs.
2. Modules missing from isBig list: there are several modules missing
from the isBig list, but these aren't being accounted for. We
deproritize large modules because they take a really long time to
process and can timeout if too many are being processed at once, so we
want to process them at a slower rate than other modules.
3. Alternative modules have the same priority as non-alternative
modules: we usually don't care about alternative modules, and they will
be deleted from search_documents once identified. These should be
processed after the lastest version of non-alternative modules are
processed to prevent unnecessary deletes.
To address these issues, reprocessing / requeue now follows the
following logic:
1. All modules are reprocess with a 50x status code based on their last
fetch status in module version states.
2. Modules are requeued in the following order (with the exception of large modules):
- Latest version of modules previously with 20x status
- Latest version of bad modules and alternative modules
- Any version of modules previously with 20x status
- Any version of bad modules and alternative modules
- Any module with a status=0 or status=500 (we expect these to already be in the queue)
3. All large modules are queued last, since these take up a lot of time
and need to be processed at a slower rate.
Within each category, modules are sorted as follows:
1. num_packages
2. version DESC
3. module_path
This keeps the order idempotent, and prioritizes smaller and newer
modules. It also allows modules of similar sizes to be processed
together.
Change-Id: I49580ed75bf60cc2698b756882bfdc906f72d935
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/725873
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Error messages returned by requests to the worker are cleaned up.
Change-Id: I79354eafc8785bbffd4bc1fb6db93cee7b517371
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/729941
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
fetchTask.timings is added to time FetchModule and each DB insert. The
results are logged as a single log line, which will be exported to
BigQuery for analysis as needed.
postgres.DeleteModule is rewritten to db.DeleteModule, for consistency
with other exported functions from internal/postgres.
Change-Id: Id6ecd7fccd238fb17a45554ec033ebf404c1020c
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/729940
Reviewed-by: Jonathan Amsterdam <jba@google.com>
At the moment, errors that occur during UpsertVersionMap, DeleteModule,
and DeleteOlderVersionFromSearchDocuments are not being logged in
module_version_states.
FetchAndUpdateState is refactored to capture these errors.
Change-Id: Ief15b894c7c6f66f6c70c01d41cd78b83b68aef7
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/725872
Reviewed-by: Jonathan Amsterdam <jba@google.com>
fetchAndInsertModule is a private function and implementation detail of
internal/worker. FetchAndUpdateState is used instead for relevant tests.
Change-Id: I39dac1e6272a67248e7adb2b3c540c418f99ba72
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/729862
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Additional traces are added to the fetch process so that we can get more
detailed information on what is taking up time in a fetch process.
Change-Id: I8f74a0d2117a18002d64453d222cb08a1fb81ff2
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/729482
Reviewed-by: Jonathan Amsterdam <jba@google.com>
When a module is inserted using postgres.InsertModule, it was first
deleted, then inserted. This allowed us to guarantee a clean module
insert, but also created a lot of load on the database. This was
particularly evident when reprocessing our dataset; during these
periods, running the following query showed that most active connections
were due to DeleteModule, which could take several minutes:
SELECT FROM pg_stat_activity WHERE state = 'active' ORDER BY query_start;
Many DeleteModule queries have also failed due to a deadlock.
This is because running DeleteModule results not only in a delete on
module, but also cascaded deletes to several tables:
1. packages
2. paths
3. licenses
4. documentation
5. readmes
6. search_documents
7. imports
8. package_imports
In general when we are inserting a module, it can be can upserted,
instead of deleted then inserted. For a given module version, we never
expect to lose rows in any of the tables above when reprocessing it.
To confirm this, postgres.validateModule has been updated to compare the
existing data in the database, and data from the new module that was
just fetched. If data went missing, we mark the module with a status
derrors.DBInvalidModuleInsert. It is assumed that comparing data only on
the packages, paths, and licenses table is a good enough proxy that data
in the other tables will also be valid.
If an error occurs, the module will be deleted inside
worker.FetchAndUpdateState (as with all fetches with status > 400).
A new derrors.DBExecErr is also added to capture insert/update/delete
errors.
Change-Id: I100ac7e4e14409065517a3268e3e215d0a03bb1c
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/728043
Reviewed-by: Jonathan Amsterdam <jba@google.com>
When inserting a new module, upsert the search documents in the same
transaction as the other inserts into modules, packages and so on.
Hopefully this will remove the foreign-key violation we've been seeing.
Updates b/154318694.
Change-Id: Iaaa0c6ff9b51286d29b10e8a72970baa814a2cff
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/727144
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Julie Qiu <julieqiu@google.com>
golang.org/x/discovery is renamed to golang.org/x/pkgsite.
When the repository is open sourced, it will be hosted at
go.googlesource.com/pkgsite.
Change-Id: Ifc3b45b771a385b99179e785447f2a87afcacf87
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/724273
Reviewed-by: Jonathan Amsterdam <jba@google.com>
At the moment when we upsert a row into module_version_states, we
delete and repopulate existing rows in package_version_states.
This adds an additional query that is costly and isn't necessary, since
we wouldn't expect there to be packages missing in a future round of
processing. Also, if for some reason that did happen, it would be useful
to have this data captured in package_version_states anyways.
Change-Id: Iceb64235957ade115293bb7c1ff62cdcb2e2073f
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/725870
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Speed up the loading of the worker status page.
Change-Id: I1503b5448fa6ef686a91fdc8fbd0490bf354177c
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/721278
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Julie Qiu <julieqiu@google.com>
Introduce internal/xcontext, which provides a way to "detach"
a context from its parent's timeout and cancellation signals.
(Copied from golang.org/x/tools.)
Use it when the worker does a fetch, to prevent the fetch
from being canceled while retaining the parent context's values.
Change-Id: I91fd7a5790b5654983ee72d8054fb74c45f9b417
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/720905
Reviewed-by: Julie Qiu <julieqiu@google.com>
Previously experiments were always inactive for the worker, because:
(1) X-Forwarded-For was empty since the request was coming from
Cloud Tasks. This is fixed by setting all experiments with rollout=100
to always be on. (There shouldn't be any case where worker flags
would only be partially on).
(2) Experiments were not being set on the new context produced by
trace.StartSpan and when a contextWithCancel was created and passed to
fetch.FetchModule. These are now set with the new
experiment.WithExperiment function.
Change-Id: Ie14b699bf435fecc8791c3ad435f73afe579812f
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/720661
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Make log messages clearer by logging the function that finished when
logging successes.
Change-Id: I5f60683628c6a37c878d3c60208fa7f2d29156c9
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/719982
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
The /populate-search-documents endpoint was no longer used, and it was
broken: it would have added to search_documents everything from the
packages table that wasn't already there, but we now omit alternative
modules and their packages from search_documents. This would have
put them back.
However, now that we are exploring changes to our search algorithm, we
do need a way to reprocess all search documents. So add a new
endpoint, /repopulate-search-documents, that upserts packages in
search_documents that haven't been updated since a given time.
Change-Id: Icbae9de078774111f3adb61a35dee95c4711dffa
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/717851
Reviewed-by: Julie Qiu <julieqiu@google.com>
Simplify the error logic of most handlers by allowing
them to return an error.
Basically the same thing we did to the frontend in
75ed0713f34c997516b5c1c3e3f0e1ddb6cd8bfa.
Leave a couple of handlers as they are because they do something unusual.
Change-Id: Ib7c5f4f4752945a32c84fc3b49987e3712997521
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/717849
Reviewed-by: Julie Qiu <julieqiu@google.com>