The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Download Latest Version sql.rar (785.4 kB)
Email in envelope

Get an email when there's a new version of Leopdo search engine

Home
Name Modified Size InfoDownloads / Week
leopdo-2012 2012-05-14
src 2011-07-01
licence.txt 2011-07-04 10 Bytes
leopdo.sql.rar 2011-07-04 130.8 kB
readme_en.txt 2011-07-04 3.6 kB
leopdo.war 2011-07-01 49.6 MB
Totals: 6 Items   49.7 MB 0
Leopdo£¨beta£© Search Engine(2011)

A web search engine and crawler written in java, including full-text and vertical search,  word segmentation system .

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
1. install: (JDK6+TOMCAT6.0+MYSQL5.5 and above)

1) install mysql(port : 3306, user/pwd : root/123456)
   install mysql gui administrator

2) import database : leopdo.sql

3) install tomcat(port : 80)

4) copy leopdo.war to webapp\

5) run tomcat


////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2. start search engine£ºopen explorer(IE or Firefox) and input such urls below to implement the tasks in order

1)
http://localhost/leopdo/bot/task/Com_websync.do?task=domain&url=http://www.hao123.com&batch=2&batchHandle=1
Retrieve a website's 2(batch=2) dimmention pages(from home page to the next level pages( which links in home page), and the second level pages),
and save in database. The website is a navigation website like http://dir.yahoo.com or http://www.hao123.com


2)
http://localhost/leopdo/bot/task/Com_websync.do?task=digdomain&url=http://www.hao123.com&batch=2&update=-1&batch2=0&dimstart=0
Retrieve the homepage of the websites which collected in the navigation website(http://www.hao123.com)


3)
http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=html&batch=1&update=-1&dimstart=0&sectionId=1531
Read the homepages from database and Retrieve these websites's 2 dimmention pages, if section=1531, read the homepages of sectionId=1531


4)
http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=key&batch=1&titleOnly=2&kupdate=-1&dimstart=0&sectionId=null
Read the pages of the websites from database and generate the keywords


5)
full-text search test£º 
http://localhost/leopdo/search.html, input the keyword


//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
3 Other application based on leopdo search engine:

build vertical search engine (news, music, book, shopping etc)

1)
select * from leopdo.thing where source1 = -1 and rec_create_location = 'hao123.com',  
find the record id which description='news', and this record id(such as 1531) is sectionId

2)
http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=html&batch=1&titleOnly=2&kupdate=-1&update=-1&dimstart=0&sectionId=1531
Read the pages of the websites which sectionId=1531 from database, if update=1, update the old pages

3)
http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=key&batch=1&titleOnly=2&kupdate=-1&update=-1&dimstart=0&sectionId=1531
Read the pages of the websites which sectionId=1531 from database, generate the keywords

4)
delete from leopdo.nthing
remove all the record of the news table

5)
http://localhost/leopdo/searcher/search.do?type=updatenews&date1=2011-06-01&date2=2011-06-02
read the news data from 2011-06-01 to 2011-06-02, sort the records and then save in news table

6)
browse the latest news: 
http://localhost/leopdo/searcher/search.do?type=news


////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Known issue: java http timeout, http connection timeout

implement the urls below to continue the task:

http://localhost/leopdo/bot/task/Com_clearpool.do?flag=1

http://localhost/leopdo/bot/task/Com_checkq.do

Source: readme_en.txt, updated 2011-07-04