8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23ai | 24ai | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
AI Vector Search in Oracle Database 23ai
This article provides a simple example of using the AI Vector Search feature in Oracle database 23ai.
- Choose the Model
- Load the Model
- Generating Vectors (VECTOR Data Type)
- Vector Search using VECTOR_DISTANCE
- Create a Vector Index (optional)
- Considerations
Choose the Model
We can't do anything useful with vectors unless we have some way of generating them from our data. This could be done externally and the vectors loaded into the database, or we can load a model into the database and generate vectors from our data using that model. I know nothing about creating my own model, so I'm going to use one provided by Oracle called all-MiniLM-L12-v2. This model is explained here. We are going to use it to generate vectors from plain text.
Download the ONNX model from here.
Load the Model
We create a directory to hold the model, download it, and unzip the model into that directory.
mkdir -p /u01/models cd /u01/models wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip unzip -oq all_MiniLM_L12_v2_augmented.zip
We connect to the database, create a test user, create a directory object pointing to the physical directory, and grant access to it for our test user.
conn sys/SysPassword1@//localhost:1521/freepdb1 as sysdba create user if not exists testuser1 identified by testuser1 quota unlimited on users; grant create session, db_developer_role, create mining model to testuser1; create or replace directory model_dir as '/u01/models'; grant read, write on directory model_dir to testuser1;
We can now load the model into the database using the DBMS_VECTOR
package.
begin dbms_vector.drop_onnx_model ( model_name => 'ALL_MINILM_L12_V2', force => true); dbms_vector.load_onnx_model ( directory => 'model_dir', file_name => 'all_MiniLM_L12_v2.onnx', model_name => 'ALL_MINILM_L12_V2'); end; /
We see the model information in the USER_MINING_MODELS
view.
column model_name format a30 column algorithm format a10 column mining_function format a15 select model_name, algorithm, mining_function from user_mining_models where model_name = 'ALL_MINILM_L12_V2'; MODEL_NAME ALGORITHM MINING_FUNCTION ------------------------------ ---------- --------------- ALL_MINILM_L12_V2 ONNX EMBEDDING SQL>
Generating Vectors (VECTOR Data Type)
We can now generate vectors using the VECTOR_EMBEDDING
function. In the following example we generate a vector for the text "Quick test". As you can see, the resulting vector is really big considering the size of the text.
set long 1000000 select vector_embedding(all_minilm_l12_v2 using 'Quick test' as data) AS my_vector; MY_VECTOR -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [-3.8644433E-002,7.27762729E-002,-6.99380785E-003,-7.29616638E-003,8.81515723E-003,-6.36086538E-002,4.39667143E-003,-4.20215279E-002,-1.32307231E-001,-5.83763141E-003,-1.32369539E-002,-1.62914675E-002 ,6.54895976E-003,-4.98352237E-002,-1.98450759E-002,-4.69920859E-002,1.03937693E-001,-8.96753371E-002,-2.77858647E-003,4.13947217E-002,-6.51627034E-002,-1.0990192E-001,-8.73052608E-003,2.533352E-002,-1 .42030744E-002,-2.42071413E-002,1.91592015E-002,4.93750861E-003,6.30867062E-003,-1.24127828E-001,-7.17296917E-003,3.73173878E-002,4.9763605E-002,4.52162512E-002,1.49683282E-002,-2.21796334E-002,-3.679 36343E-002,-6.20212057E-004,7.16803446E-002,5.33913262E-003,1.92087796E-002,-9.91346464E-002,3.90679725E-002,2.22725477E-002,5.04363291E-002,1.81943234E-002,5.34031764E-002,1.44161787E-002,-1.99908093 E-002,-1.20323906E-002,-2.63888612E-002,-4.14666645E-002,6.24738075E-002,-4.68838811E-002,1.16748791E-002,-2.43180897E-002,-3.11982501E-002,-7.57505326E-003,2.25466546E-002,-4.17359509E-002,1.23237111 E-002,4.31706719E-002,-7.83751085E-002,1.24918474E-002,5.42060696E-002,4.33742851E-002,2.52278009E-003,-1.15482165E-002,-9.98713658E-004,-2.12613232E-002,1.00960573E-002,3.17986757E-002,-1.13147125E-0 02,-1.26893111E-002,2.66182888E-002,-7.5068539E-003,-3.70341614E-002,1.9485198E-002,-2.921376E-002,-2.61210185E-002,2.86212768E-002,-9.15901735E-002,1.50552308E-002,-4.9816858E-002,2.29324102E-002,7.8 2513991E-003,4.22973074E-002,3.37974578E-002,-4.23457436E-002,-6.32970929E-002,3.84950414E-002,-1.93851739E-002,1.96237396E-003,-3.91559181E-004,7.80333811E-003,5.63595779E-002,4.45815139E-002,-4.9701 6683E-002,1.36383837E-002,2.76547611E-001,6.35802001E-002,-1.69337578E-002,-3.25948372E-002,2.74621435E-002,-1.84808951E-002,-3.58916447E-002,3.18281911E-003,-3.92074399E-002,-5.03929285E-003,-3.95198 613E-002,2.64223926E-002,5.44404797E-002,-2.97637819E-003,1.0656476E-002,4.55005579E-002,-9.6166715E-002,4.53018732E-002,3.02239601E-002,-1.11025617E-001,6.18583001E-002,8.55141282E-002,-1.51456818E-0 02,-5.6408301E-002,-5.93017961E-004,1.07500203E-001,-6.81523681E-002,1.85917076E-002,3.75313126E-002,-3.27163152E-002,-4.72421981E-002,5.59753105E-002,2.20437739E-002,2.74991728E-002,2.6306238E-002,-5 MY_VECTOR -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- .36230095E-002,6.86871773E-003,5.06029651E-003,8.86866376E-002,3.97678688E-002,-1.4952058E-003,-1.11991599E-001,-1.48920929E-002,-1.42183332E-002,5.44683374E-002,-5.46902604E-002,-3.37714627E-002,-3.9 3099897E-002,-8.8875033E-003,-2.50034444E-002,-3.8671799E-002,7.15422854E-002,-1.72948167E-002,5.7217218E-002,1.23445876E-002,-6.25340268E-002,-1.97963398E-002,4.08164188E-002,9.2235636E-003,2.3645639 4E-002,-4.27598739E-003,-1.24366455E-001,8.28649029E-002,-5.27118742E-002,-1.11121098E-002,4.35752161E-002,1.35777248E-002,-2.25060564E-002,4.55260463E-002,3.89738753E-002,-8.9330703E-002,1.17293425E- 001,5.51190637E-002,-2.56631747E-002,-5.30632809E-002,-8.39535818E-002,4.83218301E-003,6.57674894E-002,8.87271166E-002,-1.52742919E-002,1.05254455E-002,-1.58144645E-002,-3.10783144E-002,-2.9069094E-00 2,7.04223011E-003,-3.09849512E-002,-4.46299138E-003,-7.20088407E-002,-7.05658719E-002,4.65546325E-002,1.10276632E-001,3.60871851E-002,1.86070222E-002,-6.10643029E-002,3.21829244E-002,-1.43657476E-002, -6.75653145E-002,8.07486102E-002,1.68783069E-002,-1.0059043E-001,-7.55800903E-002,-1.69591829E-002,-4.45710421E-002,-8.60542338E-003,4.33787964E-002,4.29520719E-002,3.94066162E-002,8.67492054E-003,-8. 52123275E-002,1.20206453E-001,-1.14268251E-001,-1.70285124E-002,8.87670461E-003,-4.69080843E-002,-3.02138515E-002,4.61057499E-002,-4.92520146E-002,1.56181203E-002,-9.27053615E-002,-6.08294345E-002,1.4 6451779E-002,-1.8469112E-002,-1.40407547E-001,5.35490811E-002,5.85880674E-033,7.62652978E-002,-3.0770693E-002,-6.7476593E-003,1.03074148E-001,7.20860213E-002,-9.75818858E-002,1.51840553E-001,7.4332207 4E-002,-2.99238227E-002,9.39518213E-002,1.50299622E-002,4.35530245E-002,-7.58084841E-003,-7.49263093E-002,-5.07647246E-002,4.0109925E-002,-7.43360296E-002,4.62087467E-002,9.6142469E-003,3.1514265E-004 ,6.2026035E-002,1.57011151E-002,3.29307579E-002,5.6974791E-002,-7.89973959E-002,9.78369173E-003,1.16775157E-002,-3.65987495E-002,-5.30386977E-002,-1.22491308E-002,5.65312728E-002,3.41438502E-002,-4.26 849015E-002,9.84478667E-002,1.52464816E-003,-6.92429468E-002,9.64930728E-002,-1.85021404E-002,4.28027436E-002,-4.41830456E-002,-2.54553054E-002,5.20384647E-002,-1.3808256E-002,-1.59469489E-002,2.10003 MY_VECTOR -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 518E-002,-1.85695086E-002,2.29395907E-002,1.91418286E-002,4.09490913E-002,2.35137846E-002,-3.91655453E-002,3.57466601E-002,4.80409786E-002,-1.02699241E-002,1.45040052E-002,-4.27465513E-002,-5.0925903E -002,-7.12790117E-002,-9.191221E-002,1.79740768E-002,-3.53490785E-002,-2.26370115E-002,1.64974518E-002,1.05952621E-001,-3.52565721E-002,-3.41151282E-002,-5.72828352E-002,-3.10265515E-002,6.9757998E-00 2,-2.50361152E-002,-3.91423702E-002,1.37532372E-002,6.76134787E-003,-5.15896529E-002,-3.5578981E-002,6.91288859E-002,-3.47342566E-002,-1.09835267E-002,-2.45214235E-002,-3.24611999E-002,1.08489944E-002 ,8.2179103E-003,-2.97738314E-002,4.81431969E-002,-5.78631386E-002,2.85624769E-002,4.0292047E-002,2.82907616E-002,-3.79493125E-002,8.59350059E-003,1.02058621E-002,2.48055886E-002,7.16195907E-003,-6.245 35345E-002,-3.2572642E-002,4.26035373E-033,-7.57267745E-003,-4.15649898E-002,-4.98135239E-002,1.02479551E-002,3.28872576E-002,1.50394207E-003,-6.39199391E-002,-7.5357914E-002,-2.46184096E-002,-3.06450 184E-002,4.16100062E-002,7.04020485E-002,-8.15085992E-002,2.55300701E-002,1.8981915E-002,4.26408909E-002,-2.19986513E-002,7.1362392E-003,-3.42554934E-002,3.70068592E-003,-3.15256999E-003,1.41581176E-0 02,5.00134267E-002,7.54985586E-002,6.42605498E-002,7.55612329E-002,1.52721843E-002,1.15661159E-001,-2.45987438E-002,1.08358078E-002,5.02406172E-002,6.28810897E-002,-5.52952439E-002,-5.51969372E-002,-4 .60022828E-003,-1.46539817E-002,6.40283972E-002,5.18338121E-002,2.51765884E-002,6.45218417E-002,-8.35603774E-002,3.57579961E-002,6.28177961E-003,3.15946154E-003,2.32084282E-002,4.76812199E-002,-5.4778 5498E-003,-1.06323607E-001,-1.45862792E-002,-5.92180602E-002,-1.59236379E-002,-1.90922264E-002,4.61262129E-002,2.41158772E-002,-7.9021994E-003,1.11448221E-001,1.11205485E-002,-2.0573834E-002,-4.086579 75E-002,5.54621816E-002,5.37177958E-002,4.76263165E-002,-3.29907499E-002,4.3731384E-002] SQL>
We can use the VECTOR_EMBEDDING
function in DML, which is what we will do in the next example.
We need some text to use in our test, so we grab some movie quotes from Hugging Face. We will use the existing directory as it already has the correct setup.
cd /u01/model wget https://huggingface.co/datasets/ygorgeurts/movie-quotes/resolve/main/movie_quotes.csv?download=true -O movie_quotes.csv
We connect to the database and create a new table from the CSV file of movie quotes.
conn testuser1/testuser1@//localhost:1521/freepdb1 drop table if exists movie_quotes purge; create table movie_quotes as select movie_quote, movie, movie_type, movie_year from external ( ( movie_quote varchar2(400), movie varchar2(200), movie_type varchar2(50), movie_year number(4) ) type oracle_loader default directory model_dir access parameters ( records delimited by newline skip 1 badfile model_dir logfile model_dir:'moview_quotes_ext_tab_%a_%p.log' discardfile model_dir fields csv with embedded terminated by ',' optionally enclosed by '"' missing field values are null ( movie_quote char(400), movie, movie_type, movie_year ) ) location ('movie_quotes.csv') reject limit unlimited ); desc movie_quotes Name Null? Type ----------------------------------------- -------- ---------------------------- MOVIE_QUOTE VARCHAR2(400) MOVIE VARCHAR2(200) MOVIE_TYPE VARCHAR2(50) MOVIE_YEAR NUMBER(4) SQL>
We add a new column to hold the vector data for each movie quote. we are using the new VECTOR
data type.
alter table movie_quotes add ( movie_quote_vector vector ); desc movie_quotes Name Null? Type ----------------------------------------- -------- ---------------------------- MOVIE_QUOTE VARCHAR2(400) MOVIE VARCHAR2(200) MOVIE_TYPE VARCHAR2(50) MOVIE_YEAR NUMBER(4) MOVIE_QUOTE_VECTOR VECTOR(*, *) SQL>
We populate the new column by generating vectors from the movie quotes.
update movie_quotes set movie_quote_vector = vector_embedding(all_minilm_l12_v2 using movie_quote as data); commit;
Vector Search using VECTOR_DISTANCE
We perform a search using the VECTOR_DISTANCE
function. This accepts two vectors and as the name suggests, returns a distance between them. Since the model we used generates vectors for text data, we would expect the vector distance to be smaller for two vectors that are similar. In the following examples we create a vector from our search text and order the output of the query by the vector distance between our search text and the quote text.
First we ask for "Films with motivational speaking in them".
variable search_text varchar2(100); exec :search_text := 'Films with motivational speaking in them'; set linesize 200 column movie format a50 column movie_quote format a100 SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance, movie, movie_quote FROM movie_quotes order by 1 fetch approximate first 5 rows only; DISTANCE MOVIE MOVIE_QUOTE ---------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- 6.786E-001 Once Upon a Time in Hollywood That was the best acting i've ever seen in my whole life. 6.979E-001 Dead Poets Society You must strive to find your own voice because the longer you wait to begin, the less likely you are going to find it at all. 7.169E-001 The Pursuit of Happyness Walk that walk and go forward all the time. Don't just talk that talk, walk it and go forward. Also, the walk didn't have to be long strides; baby steps counted too. Go forward. 7.186E-001 Joker My mother always tells me to smile and put on a happy face. She told me I had a purpose to bring lau ghter and joy to the world. 7.234E-001 Blazing Saddles Men, you are about to embark on a great crusade to stamp out runaway decency in the west. Now you me n will only be risking your lives, whilst I will be risking an almost certain Academy Award nominati on for Best Supporting Actor. SQL>
Next we ask for "Films about war".
variable search_text varchar2(100); exec :search_text := 'Films about war'; set linesize 200 column movie format a50 column movie_quote format a100 SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance, movie, movie_quote FROM movie_quotes order by 1 fetch approximate first 5 rows only; DISTANCE MOVIE MOVIE_QUOTE ---------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- 5.682E-001 Dr. Strangelove Gentlemen, you can't fight in here! This is the War Room! 6.346E-001 Blazing Saddles Men, you are about to embark on a great crusade to stamp out runaway decency in the west. Now you me n will only be risking your lives, whilst I will be risking an almost certain Academy Award nominati on for Best Supporting Actor. 6.587E-001 Fury Ideals are peaceful; history is violent. 7.243E-001 The Kill Team You give me your loyalty, and I?ll guarantee that each and every one of you will have a chance to be a warrior, to actually be a part of history. 7.253E-001 Dr. No Bond. James Bond SQL>
Create a Vector Index (optional)
If we want to create a vector index, we must set the VECTOR_MEMORY_SIZE
parameter for the root container. The amount of memory we need depends on the size and complexity of the data being indexed.
conn / as sysdba alter system set vector_memory_size = 1G scope=spfile; shutdown immediate; startup;
Provided we've assigned enough memory, we can create vector indexes now.
There are two types of vector indexes.
- Neighbor Partition Vector Indexes
- In-Memory Neighbor Graph Vector Index
Here is an example of each.
conn testuser1/testuser1@//localhost:1521/freepdb1 drop index if exists movie_quotes_vector_idx; -- Neighbor Partition Vector Index create vector index movie_quotes_vector_idx on movie_quotes(movie_quote_vector) organization neighbor partitions distance cosine with target accuracy 95; drop index if exists movie_quotes_vector_idx; -- In-Memory Neighbor Graph Vector Index create vector index movie_quotes_vector_idx on movie_quotes(movie_quote_vector) organization inmemory neighbor graph distance cosine with target accuracy 95;
More details about vector indexes can be found here.
Considerations
Here are some things to consider when using the AI vector search functionality.
- I know very little about AI, machine learning and vectors. This article is just to prove to myself I understand the basic mechanism of using this feature in Oracle database 23ai.
- There is a whole manual dedicated to AI vector search, so clearly a single post can only scratch the surface of this functionality.
- This functionality relies on us having a suitable model for the task we are trying to perform. If we don't have a good model, the results will not be great.
- Vectors are big. This has to be taken into account from both a storage perspective, and a memory perspective, especially when creating vector indexes.
For more information see:
- Oracle AI Vector Search User's Guide
- DBMS_VECTOR
- Manage the Different Categories of Vector Indexes
- Now Available! Pre-built Embedding Generation model for Oracle Database 23ai
- What’s our vector, Victor? My ‘Hello, World’ demo with beer.
Hope this helps. Regards Tim...