diff --git a/doc/integration/advanced_search/elasticsearch.md b/doc/integration/advanced_search/elasticsearch.md index b8df5ee3027c89aa1252cd8df5bf2c7e5bb9fc99..e4cfcc542e347bca1bc700233fcb64cdffc7ce9c 100644 --- a/doc/integration/advanced_search/elasticsearch.md +++ b/doc/integration/advanced_search/elasticsearch.md @@ -553,6 +553,19 @@ To enable search with advanced search in GitLab: 1. Select the **Search with advanced search** checkbox. 1. Select **Save changes**. +### Enable code search with advanced search + +Prerequisites: + +- You must have administrator access to the instance. + +To enable code search with advanced search in GitLab: + +1. In the upper-right corner, select **Admin**. +1. Select **Settings** > **Search**. +1. Select the **Code search with advanced search** checkbox. +1. Select **Save changes**. + ### Advanced search configuration The following Elasticsearch settings are available: @@ -562,6 +575,7 @@ The following Elasticsearch settings are available: | **Turn on indexing for advanced search** | Turns on or turns off indexing and creates an empty index if one does not already exist. You may want to turn on indexing but turn off search to give the index time to be fully completed, for example. Also, keep in mind that this option doesn't have any impact on existing data, this only enables/disables the background indexer which tracks data changes and ensures new data is indexed. | | **Pause indexing for advanced search** | Pauses advanced search indexing. This is useful for cluster migration/reindexing. All changes are still tracked, but they are not committed to the index until resumed. | | **Search with advanced search** | Turns on or turns off the advanced search capabilities in search and [advanced vulnerability management](../../user/application_security/vulnerability_report/_index.md#advanced-vulnerability-management). | +| **Code search with advanced search** | Turns on or turns off code search with advanced search. When this setting is turned off, all code is deleted from your Elasticsearch instance. To turn this setting back on, fully reindex your code. If exact code search is enabled, you should turn off this setting to save resources. | | **Requeue indexing workers** | Turns on automatic requeuing of indexing workers. This improves non-code indexing throughput by enqueuing Sidekiq jobs until all documents are processed. Requeuing indexing workers is not recommended for smaller instances or instances with few Sidekiq processes. | | **URL** | The URL of your Elasticsearch instance. Use a comma-separated list to support clustering (for example, `http://host1, https://host2:9200`). If your Elasticsearch instance is password-protected, use the `Username` and `Password` fields. Alternatively, use inline credentials such as `http://:@:9200/`. If you use [OpenSearch](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/vpc.html), only connections over ports `80` and `443` are accepted. | | **Username** | The `username` of your Elasticsearch instance. | @@ -722,6 +736,19 @@ To disable search with advanced search in GitLab: 1. Clear the **Search with advanced search** checkbox. 1. Select **Save changes**. +### Disable code search with advanced search + +Prerequisites: + +- You must have administrator access to the instance. + +To disable code search with advanced search in GitLab: + +1. In the upper-right corner, select **Admin**. +1. Select **Settings** > **Search**. +1. Clear the **Code search with advanced search** checkbox. +1. Select **Save changes**. + ## Pause indexing Prerequisites: diff --git a/ee/app/helpers/ee/application_settings_helper.rb b/ee/app/helpers/ee/application_settings_helper.rb index c7d1d261db535d2721ce2f24c0e0f2a41b9aa563..dc21b2245a2b6c6b88780e7e37877f35b20c530f 100644 --- a/ee/app/helpers/ee/application_settings_helper.rb +++ b/ee/app/helpers/ee/application_settings_helper.rb @@ -51,6 +51,7 @@ def visible_attributes :elasticsearch_analyzers_smartcn_search, :elasticsearch_analyzers_kuromoji_enabled, :elasticsearch_analyzers_kuromoji_search, + :elasticsearch_code_scope, :enforce_namespace_storage_limit, :geo_node_allowed_ips, :geo_status_timeout, diff --git a/ee/app/models/ee/application_setting.rb b/ee/app/models/ee/application_setting.rb index 25b113db46c2f4ca7e0ecfe1be17a39f2508a553..f5da883d349a23caec076c2f9510274743955f7e 100644 --- a/ee/app/models/ee/application_setting.rb +++ b/ee/app/models/ee/application_setting.rb @@ -394,6 +394,7 @@ module ApplicationSetting after_commit :update_personal_access_tokens_lifetime, if: :saved_change_to_max_personal_access_token_lifetime? after_commit :trigger_clickhouse_for_analytics_enabled_event + after_commit :remove_code_data_from_elasticsearch, if: :elasticsearch_code_scope_opted_out? end class_methods do @@ -947,5 +948,13 @@ def duo_settings_immutable_on_saas errors.add(:base, 'Duo settings cannot be modified on GitLab.com') end + + def remove_code_data_from_elasticsearch + ::Search::Elastic::DeleteWorker.perform_async('task' => 'delete_all_blobs') + end + + def elasticsearch_code_scope_opted_out? + elasticsearch_code_scope_previously_changed?(from: true, to: false) + end end end diff --git a/ee/app/services/search/elastic/delete/all_blobs_service.rb b/ee/app/services/search/elastic/delete/all_blobs_service.rb new file mode 100644 index 0000000000000000000000000000000000000000..b4575a6a69e23f251e793dd38db5d2cb1878585c --- /dev/null +++ b/ee/app/services/search/elastic/delete/all_blobs_service.rb @@ -0,0 +1,27 @@ +# frozen_string_literal: true + +module Search + module Elastic + module Delete + class AllBlobsService < BaseService + private + + def index_name + ::Elastic::Latest::Config.index_name + end + + def build_query + return {} if Gitlab::CurrentSettings.elasticsearch_code_scope? + + { + query: { + term: { + type: 'blob' + } + } + } + end + end + end + end +end diff --git a/ee/app/services/search/rake_task_executor_service.rb b/ee/app/services/search/rake_task_executor_service.rb index ca1fbecdd65b4f384ace31f2cff6b4b9ae0cbdaa..832b63525167e053ba98aa28d0a1d53fc94c00af 100644 --- a/ee/app/services/search/rake_task_executor_service.rb +++ b/ee/app/services/search/rake_task_executor_service.rb @@ -524,25 +524,21 @@ def check_handler end def display_search_application_settings - setting = ::ApplicationSetting.current + setting = Gitlab::CurrentSettings current_index_version = helper.get_meta&.dig('created_by') - logger.info("Indexing enabled:\t\t#{setting.elasticsearch_indexing? ? Rainbow('yes').green : 'no'}") logger.info("Search enabled:\t\t\t#{setting.elasticsearch_search? ? Rainbow('yes').green : 'no'}") - logger.info("Requeue Indexing workers:\t" \ - "#{setting.elasticsearch_requeue_workers? ? Rainbow('yes').green : 'no'}") - logger.info("Pause indexing:\t\t\t" \ - "#{setting.elasticsearch_pause_indexing? ? Rainbow('yes').green : 'no'}") + logger.info("Code search enabled:\t\t#{setting.elasticsearch_code_scope? ? Rainbow('yes').green : 'no'}") + logger.info("Requeue Indexing workers:\t#{setting.elasticsearch_requeue_workers? ? Rainbow('yes').green : 'no'}") + logger.info("Pause indexing:\t\t\t#{setting.elasticsearch_pause_indexing? ? Rainbow('yes').green : 'no'}") logger.info("Indexing restrictions enabled:\t" \ "#{setting.elasticsearch_limit_indexing? ? Rainbow('yes').yellow : 'no'}") logger.info("File size limit:\t\t#{setting.elasticsearch_indexed_file_size_limit_kb} KiB") logger.info("Index version:\t\t\t#{current_index_version}") - logger.info("Indexing number of shards:\t" \ - "#{::Elastic::ProcessBookkeepingService.active_number_of_shards}") - logger.info("Max code indexing concurrency:\t" \ - "#{setting.elasticsearch_max_code_indexing_concurrency}") + logger.info("Indexing number of shards:\t#{::Elastic::ProcessBookkeepingService.active_number_of_shards}") + logger.info("Max code indexing concurrency:\t#{setting.elasticsearch_max_code_indexing_concurrency}") logger.info("Prefix:\t\t\t\t#{setting.elasticsearch_prefix}") - logger.info("Client adapter:\t\t\t\t#{setting.elasticsearch_client_adapter}") + logger.info("Client adapter:\t\t\t#{setting.elasticsearch_client_adapter}") end def display_search_server_info diff --git a/ee/app/views/admin/application_settings/_elasticsearch_form.html.haml b/ee/app/views/admin/application_settings/_elasticsearch_form.html.haml index f1e22176a34d40b682c07af47d2dae07525ad7e6..f2288921c8ebafd5da0df2ab1774a5912deede76 100644 --- a/ee/app/views/admin/application_settings/_elasticsearch_form.html.haml +++ b/ee/app/views/admin/application_settings/_elasticsearch_form.html.haml @@ -48,6 +48,9 @@ .form-group = f.gitlab_ui_checkbox_component :elasticsearch_search, s_('AdminSettings|Search with advanced search'), checkbox_options: { data: { testid: 'search-checkbox' } }, help_text: s_('AdminSettings|Turn off advanced search until indexing is complete.') + .form-group + = f.gitlab_ui_checkbox_component :elasticsearch_code_scope, s_('AdminSettings|Code search with advanced search'), checkbox_options: { data: { testid: 'code-search-checkbox' } }, help_text: s_('AdminSettings|If exact code search is enabled, you should turn off this setting to save resources.') + .form-group = f.gitlab_ui_checkbox_component :elasticsearch_requeue_workers, s_('AdminSettings|Requeue indexing workers'), help_text: s_('AdminSettings|Improve non-code indexing throughput by enqueuing Sidekiq jobs until all documents are processed.') diff --git a/ee/app/workers/search/elastic/delete_worker.rb b/ee/app/workers/search/elastic/delete_worker.rb index 11d8215a17b05c2b4f3730cbc11745b5981b09a0..acf1428a7ca27c93fb0c619c0e062ae2ee345fc5 100644 --- a/ee/app/workers/search/elastic/delete_worker.rb +++ b/ee/app/workers/search/elastic/delete_worker.rb @@ -14,8 +14,9 @@ class DeleteWorker pause_control :advanced_search TASKS = { - delete_project_work_items: ::Search::Elastic::Delete::ProjectWorkItemsService, - delete_project_vulnerabilities: ::Search::Elastic::Delete::VulnerabilityService + delete_all_blobs: ::Search::Elastic::Delete::AllBlobsService, + delete_project_vulnerabilities: ::Search::Elastic::Delete::VulnerabilityService, + delete_project_work_items: ::Search::Elastic::Delete::ProjectWorkItemsService }.freeze def perform(options = {}) diff --git a/ee/spec/helpers/ee/application_settings_helper_spec.rb b/ee/spec/helpers/ee/application_settings_helper_spec.rb index 56b0dd8616cefd01172aff4cf03de0c5cb1287c3..0fd1aac3fe67b39e591116bd3959612b586c2dc4 100644 --- a/ee/spec/helpers/ee/application_settings_helper_spec.rb +++ b/ee/spec/helpers/ee/application_settings_helper_spec.rb @@ -17,6 +17,7 @@ it 'contains search parameters' do expected_fields = %i[ + elasticsearch_code_scope global_search_code_enabled global_search_commits_enabled global_search_wiki_enabled diff --git a/ee/spec/models/application_setting_spec.rb b/ee/spec/models/application_setting_spec.rb index 1b54705ca9721032d05c068980cd5befe0608f5d..0cf98f4326d4ff57a8c9710a0d72e3567607aefa 100644 --- a/ee/spec/models/application_setting_spec.rb +++ b/ee/spec/models/application_setting_spec.rb @@ -1507,6 +1507,75 @@ end end + describe 'callbacks' do + describe '#remove_code_data_from_elasticsearch', feature_category: :global_search do + context 'when elasticsearch_code_scope is opted out(true -> false)' do + before do + setting.elasticsearch_code_scope = true + setting.save!(validate: false) + end + + it 'calls Search::Elastic::DeleteWorker' do + expect(Search::Elastic::DeleteWorker).to receive(:perform_async).with('task' => 'delete_all_blobs') + + setting.update!(elasticsearch_code_scope: false) # opted out + expect(setting.reload.elasticsearch_code_scope).to be false + end + + context 'when some other setting is also changed' do + it 'calls Search::Elastic::DeleteWorker' do + expect(Search::Elastic::DeleteWorker).to receive(:perform_async).with('task' => 'delete_all_blobs') + + setting.update!(elasticsearch_code_scope: false, elasticsearch_retry_on_failure: 3) + expect(setting.reload.elasticsearch_code_scope).to be false + end + end + end + + context 'when elasticsearch_code_scope is opted in(false -> true)' do + before do + setting.elasticsearch_code_scope = false + setting.save!(validate: false) + end + + it 'does not call Search::Elastic::DeleteWorker' do + expect(Search::Elastic::DeleteWorker).not_to receive(:perform_async) + + setting.update!(elasticsearch_code_scope: true) # opted in + expect(setting.reload.elasticsearch_code_scope).to be true + end + end + + context 'when elasticsearch_code_scope setting is untouched' do + context 'when current elasticsearch_code_scope is true' do + before do + setting.elasticsearch_code_scope = true + setting.save!(validate: false) + end + + it 'does not call Search::Elastic::DeleteWorker' do + expect(Search::Elastic::DeleteWorker).not_to receive(:perform_async) + + setting.update!(elasticsearch_retry_on_failure: 5) + end + end + + context 'when current elasticsearch_code_scope is false' do + before do + setting.elasticsearch_code_scope = false + setting.save!(validate: false) + end + + it 'does not call Search::Elastic::DeleteWorker' do + expect(Search::Elastic::DeleteWorker).not_to receive(:perform_async) + + setting.update!(elasticsearch_retry_on_failure: 5) + end + end + end + end + end + describe 'search curation settings after .create_from_defaults', feature_category: :global_search do it { expect(setting.search_max_shard_size_gb).to eq(1) } it { expect(setting.search_max_docs_denominator).to eq(100) } diff --git a/ee/spec/services/search/elastic/delete/all_blobs_service_spec.rb b/ee/spec/services/search/elastic/delete/all_blobs_service_spec.rb new file mode 100644 index 0000000000000000000000000000000000000000..a300bf45e8a32c5baae71f357fdc52acbce34205 --- /dev/null +++ b/ee/spec/services/search/elastic/delete/all_blobs_service_spec.rb @@ -0,0 +1,80 @@ +# frozen_string_literal: true + +require 'spec_helper' + +RSpec.describe Search::Elastic::Delete::AllBlobsService, feature_category: :global_search do + let(:main_index) { Elastic::Latest::Config.index_name } + + describe 'integration', :elastic_delete_by_query, :elasticsearch_settings_enabled do + context 'when blobs are present in index', :sidekiq_inline do + let_it_be(:project) { create(:project, :small_repo) } + + before do + project.repository.index_commits_and_blobs + create(:personal_snippet) + ensure_elasticsearch_index! + end + + context 'when setting elasticsearch_code_scope is disabled' do + before do + stub_ee_application_setting(elasticsearch_code_scope: false) + end + + it 'only deletes all blob documents from the main index' do + # Verify index has documents + initial_blob_docs, initial_non_blob_docs = docs_in_index_partition_by_type_blobs + expect(initial_blob_docs).not_to be_empty + expect(initial_non_blob_docs).not_to be_empty + + described_class.execute({}) + + # Refresh the index to make deletions visible + es_helper.refresh_index(index_name: main_index) + + # Verify only blob documents are deleted + final_blob_docs, final_non_blob_docs = docs_in_index_partition_by_type_blobs + expect(final_blob_docs).to be_empty + expect(final_non_blob_docs).not_to be_empty + end + end + + context 'when setting elasticsearch_code_scope is enabled' do + before do + stub_ee_application_setting(elasticsearch_code_scope: true) + end + + it 'does not delete any document' do + # Verify index has documents + initial_blob_docs, initial_non_blob_docs = docs_in_index_partition_by_type_blobs + expect(initial_blob_docs).not_to be_empty + expect(initial_non_blob_docs).not_to be_empty + + described_class.execute({}) + + # Refresh the index to make deletions visible + es_helper.refresh_index(index_name: main_index) + + # Verify index still has documents + initial_blob_docs, initial_non_blob_docs = docs_in_index_partition_by_type_blobs + expect(initial_blob_docs).not_to be_empty + expect(initial_non_blob_docs).not_to be_empty + end + end + end + + context 'when no blobs are present in index' do + it 'completes successfully without errors' do + # Verify index has no documents + initial_blobs_docs, initial_non_blobs_docs = docs_in_index_partition_by_type_blobs + expect(initial_blobs_docs).to be_empty + expect(initial_non_blobs_docs).to be_empty + + expect { described_class.execute({}) }.not_to raise_error + end + end + + def docs_in_index_partition_by_type_blobs + items_in_index(main_index, source: true).partition { |doc| doc['type'] == 'blob' } + end + end +end diff --git a/ee/spec/services/search/rake_task_executor_service_spec.rb b/ee/spec/services/search/rake_task_executor_service_spec.rb index 940daabec31f849d06fd3ff41e04ce70550ec0b6..0b5f73f0d1f22c1b7fc505f548fcb375fd5e69e2 100644 --- a/ee/spec/services/search/rake_task_executor_service_spec.rb +++ b/ee/spec/services/search/rake_task_executor_service_spec.rb @@ -1040,13 +1040,111 @@ info end + context 'for yes/no settings' do + it 'outputs Indexing enabled as yes when indexing is enabled' do + allow(settings).to receive(:elasticsearch_indexing?).and_return(true) + + expect(logger).to receive(:info).with(/Indexing enabled:\s+yes/) + + info + end + + it 'outputs Indexing enabled as no when indexing is disabled' do + expect(logger).to receive(:info).with(/Indexing enabled:\s+no/) + allow(settings).to receive(:elasticsearch_indexing?).and_return(false) + + info + end + + it 'outputs Search enabled as yes when search is enabled' do + allow(settings).to receive(:elasticsearch_search?).and_return(true) + + expect(logger).to receive(:info).with(/Search enabled:\s+yes/) + + info + end + + it 'outputs Search enabled as no when search is disabled' do + allow(settings).to receive(:elasticsearch_search?).and_return(false) + + expect(logger).to receive(:info).with(/Search enabled:\s+no/) + + info + end + + it 'outputs Code search enabled as yes when code search is enabled' do + allow(settings).to receive(:elasticsearch_code_scope?).and_return(true) + + expect(logger).to receive(:info).with(/Code search enabled:\s+yes/) + + info + end + + it 'outputs Code search enabled as no when code search is disabled' do + allow(settings).to receive(:elasticsearch_code_scope?).and_return(false) + + expect(logger).to receive(:info).with(/Code search enabled:\s+no/) + + info + end + + it 'outputs Requeue Indexing workers as yes when requeue workers is enabled' do + allow(settings).to receive(:elasticsearch_requeue_workers?).and_return(true) + + expect(logger).to receive(:info).with(/Requeue Indexing workers:\s+yes/) + + info + end + + it 'outputs Requeue Indexing workers as no when requeue workers is disabled' do + allow(settings).to receive(:elasticsearch_requeue_workers?).and_return(false) + + expect(logger).to receive(:info).with(/Requeue Indexing workers:\s+no/) + + info + end + + it 'outputs Pause indexing as yes when pause indexing is enabled' do + allow(settings).to receive(:elasticsearch_pause_indexing?).and_return(true) + + expect(logger).to receive(:info).with(/Pause indexing:\s+yes/) + + info + end + + it 'outputs Pause indexing as no when pause indexing is disabled' do + allow(settings).to receive(:elasticsearch_pause_indexing?).and_return(false) + + expect(logger).to receive(:info).with(/Pause indexing:\s+no/) + + info + end + + it 'outputs Indexing restrictions enabled as yes when limit indexing is enabled' do + allow(settings).to receive(:elasticsearch_limit_indexing?).and_return(true) + + expect(logger).to receive(:info).with(/Indexing restrictions enabled:\s+yes/) + + info + end + + it 'outputs Indexing restrictions enabled as no when limit indexing is disabled' do + allow(settings).to receive(:elasticsearch_limit_indexing?).and_return(false) + + expect(logger).to receive(:info).with(/Indexing restrictions enabled:\s+no/) + + info + end + end + it 'outputs indexing and search settings' do expected_regex = [ - /Indexing enabled:\s+yes/, - /Search enabled:\s+yes/, - /Requeue Indexing workers:\s+no/, - /Pause indexing:\s+no/, - /Indexing restrictions enabled:\s+no/ + /File size limit:\s+#{settings.elasticsearch_indexed_file_size_limit_kb} KiB/, + /Index version:\s+\d+/, + /Indexing number of shards:\s+\d+/, + /Max code indexing concurrency:\s+#{settings.elasticsearch_max_code_indexing_concurrency}/, + /Prefix:\s+#{settings.elasticsearch_prefix}/, + /Client adapter:\s+#{settings.elasticsearch_client_adapter}/ ] expected_regex.each do |expected| diff --git a/ee/spec/workers/search/elastic/delete_worker_spec.rb b/ee/spec/workers/search/elastic/delete_worker_spec.rb index 7a4602fd6d525a715c9f0ba7a9e873115ae9e869..d5b48ddb31da65b1bb7d9ae04049bd83c75b48a2 100644 --- a/ee/spec/workers/search/elastic/delete_worker_spec.rb +++ b/ee/spec/workers/search/elastic/delete_worker_spec.rb @@ -39,10 +39,10 @@ end context 'when we pass valid task' do + subject(:perform) { described_class.new.perform({ task: task }) } + context 'with delete_project_work_items task' do - subject(:perform) do - described_class.new.perform({ task: :delete_project_work_items }) - end + let(:task) { :delete_project_work_items } it 'calls the corresponding service' do expect(::Search::Elastic::Delete::ProjectWorkItemsService).to receive(:execute) @@ -51,20 +51,29 @@ end context 'with delete_project_vulnerabilities task' do - subject(:perform) do - described_class.new.perform({ task: :delete_project_vulnerabilities }) - end + let(:task) { :delete_project_vulnerabilities } it 'calls the corresponding service' do expect(::Search::Elastic::Delete::VulnerabilityService).to receive(:execute) perform end end + + context 'with delete_all_blobs task' do + let(:task) { :delete_all_blobs } + + it 'calls the corresponding service' do + expect(::Search::Elastic::Delete::AllBlobsService).to receive(:execute) + perform + end + end end context 'when we pass invalid task' do + let(:task) { :unknown_task } + it 'raises ArgumentError' do - expect { described_class.new.perform({ task: :unknown_task }) }.to raise_error(ArgumentError) + expect { perform }.to raise_error(ArgumentError) end end end diff --git a/locale/gitlab.pot b/locale/gitlab.pot index 04967cd8ef03eee18ffaebf0060cbacbb1b0c560..2615069c46dc4b1c896896486fe2327ae7a03ff2 100644 --- a/locale/gitlab.pot +++ b/locale/gitlab.pot @@ -5640,6 +5640,9 @@ msgstr "" msgid "AdminSettings|Code can be imported from enabled sources during project creation. OmniAuth must be configured for GitHub %{github_docs_link_start}%{icon}%{github_docs_link_end} and Bitbucket %{bitbucket_docs_link_start}%{icon}%{bitbucket_docs_link_end}." msgstr "" +msgid "AdminSettings|Code search with advanced search" +msgstr "" + msgid "AdminSettings|Collector host" msgstr "" @@ -5787,6 +5790,9 @@ msgstr "" msgid "AdminSettings|If GitLab manages your cluster, then GitLab retains your analytics data for 1 year. %{link_start}Learn more about data retention policy%{link_end}." msgstr "" +msgid "AdminSettings|If exact code search is enabled, you should turn off this setting to save resources." +msgstr "" + msgid "AdminSettings|If no unit is written, it defaults to seconds. For example, these are all equivalent: %{oneDayInSeconds}, %{oneDayInHoursHumanReadable}, or %{oneDayHumanReadable}. Minimum value is two hours. %{linkStart}Learn more.%{linkEnd}" msgstr ""