Questions (and their answers) about Linear Programming and related topics. Keywords: FAQ, LP, Linear Programming Archive-name: linear-programming-faq Last-modified: 1992/10/19 Version: 1.1 Linear Programming - Frequently Asked Questions List (LP-FAQ) Most recent update: October 19, 1992 -------------------------------------------------------------------------- 1. "What is Linear Programming?" A: A linear program (LP) is a problem that can be put into the form minimize cx subject to Ax=b x>=0 where x is a vector to be solved for, A is a matrix of known coefficients, and c and b are vectors of known coefficients. All these entities must have consistent dimensions, of course, and you can add "transposes" to taste. The matrix A is generally not square (hence just doing x=inv(A)*b is not a solution), and usually has more columns than rows. Ax=b is therefore underdetermined, leaving (usually) great latitude in the choice of x with which to minimize cx. Other formulations can be used in this framework. For instance, if you want to maximize instead of minimize, multiply the c vector by -1. If you have constraints that are inequalities rather than equations, you can introduce one new variable (a "slack") for each inequality and treat the augmented row of the matrix as an equation. LP codes will often take care of such "bookkeeping" for you. LP problems are usually solved by a technique known as the Simplex Method, developed in the 1940's and after. Briefly stated, this method works by taking a sequence of square submatrices of A and solving for x, in such a way that successive solutions always improve, until a point is reached where no improvement is possible. A family of LP algorithms known as Interior Point methods has been developed starting in the 1980's, that can be faster for many (but so far not all) problems. LP has a variety of uses, in such areas as petroleum, finance, trans- portation, forestry, and military. -------------------------------------------------------------------------- 2. "Where is there a good code, preferably public domain, to solve Linear Programming problems?" A: It depends on the difficulty of your models. LP technology and computer technology have both made such great leaps that models that were previously considered "large" are now routinely solved. Nowadays, models with a few thousand constraints, and several thousand variables, can be tackled with a PC. Good LP codes on workstations can often handle models with variables in the tens of thousands, or even greater. It's hard to be specific about sizes and speed, a priori, due to the wide variation in things like model structure and difficulty in fac- torizing the basis matrices. There is a recently released public domain code called "lp_solve" that is available on Usenet in the "comp.sources.reviewed" newsgroup. Its author claims to have solved models with up to 30,000 variables and 50,000 constraints. My own experience with this code is not quite so uniformly optimistic; still, for someone who isn't sure just what kind of LP code is needed, it represents a very reasonable first try, and the price is right. That program is archived at anonymous ftp site "ftp.uu.net", in directory "/usenet/comp.sources.reviewed/volume02/lp_solve". For MS-DOS PC users, Prof. Timo Salmi at the University of Vaasa in Finland offers a code called "tslin". You should be able to access it by ftp at garbo.uwasa.fi in directory /pc/ts, or else I suggest contacting Prof. Salmi at ts@uwasa.fi. The consensus is that the LP code published in Numerical Recipes is not at all strong, and should be avoided for heavy-duty use. If your requirement is for a solver that can handle 100-variable models, it might be okay. There is an ACM TOMS routine for LP, #552, available from the netlib server, in directory /netlib/toms. See the section on test models for detail on how to use this server. If your models prove to be too difficult for free software to handle, then you can consider acquiring a commercial LP code. There are dozens of such codes on the market. I have my own opinions, but for reasons of space, generality and fairness, I will not attempt even to list the codes I know of here. Instead I refer you to the annual survey of LP software published in "OR/MS Today", a joint publication of ORSA (Operations Research Society of America) and TIMS (The Institute of Management Science). I think it's likely that you can find a copy of the June, 1992 issue, either through a library, or by contacting a member of these two organizations (most universities probably have several members among the faculty and student body). The survey lists almost fifty actively marketed products. This publication also carries advertisements for many of these products, which may give you additional information to help make a decision. If you have access to one of the math libraries, such as IMSL or NAG, you may be able to use an LP routine from there. There are many considerations in selecting an LP code. Speed is important, but LP is complex enough that different codes go faster on different models; you won't find a "Consumer Reports" 8v) article to say with certainty which code is THE fastest. I usually suggest getting benchmark results for your particular type of model if speed is paramount to you. Benchmarking may also help determine whether a given code has sufficient numerical stability for your kind of models. Other questions you should answer: Can you use a stand-alone code, or do you need a code that can be used as a callable library, or do you require source code? Do you want the flexibility of a code that runs on many platforms and/or operating systems, or do you want code that's tuned to your particular hardware architecture (in which case your hardware vendor may have suggestions)? Is the choice of algorithm (Simplex, Interior Point) important to you? Do you need an interface to a spreadsheet program? Is the purchase price an overriding concern? Is the software offered at an academic discount (assuming you are at a university)? How much hotline support do you think you'll need? It may not always be true that "you get what you pay for," but it is rare that you get more than you pay for. 8v) There is usually a large difference in LP codes, in performance (speed, numerical stability, adaptability to computer architectures) and features, as you climb the price scale. If a code seems overpriced to you, you may not yet understand all of its features. -------------------------------------------------------------------------- 3. "Oh, and we also want to solve it as an integer program. I think there will be only a few thousand variables or so." A: Hmmmm. You want - Nontrivial model size - Integer solutions - Public domain code Pick one or maybe two of the above. You can't have all three. 8v) Integer LP models are ones where the answers must not take fractional values. It may not be obvious that this is a VERY much harder problem than ordinary LP, but it is nonetheless true. Integer models may be ones where only some of the variables are to be integer and others may be real-valued (termed "Mixed Integer LP" or MILP, or "Mixed Integer Programs" or MIP), or they may be ones where all the variables must be integral (termed "Integer LP" or ILP). The class of ILP is often further subdivided into problems where the only legal values are {0,1} ("Binary" or "Zero-One" ILP), and general integer problems. For the sake of generality, the Integer LP problem will be referred to here as MIP, since the other classes can be viewed as special cases of MIP. You should be prepared to solve far smaller MIP models than the corresponding LP model, given a certain amount of time you wish to allow (unless you and your model happen to be very lucky). There exist models that are considered challenging, with mere hundreds of variables. Conversely, some models with tens of thousands of variables solve readily. It all depends, and the best explanations of "why" always seem to happen after the fact. 8v) One exception to this gloomy outlook is that there are certain models whose LP solution always turns out to be integer. Best known of these are the so-called Transportation Problem, Assignment Problem, and Network-Flow Problem. It turns out that these problems are best solved by specialized routines that take major shortcuts in the Simplex Method, and as a result are relatively quick running. See the section on references for a book by Kennington and Helgason, which contains some source code for Netflo. Netflo is available by anonymous ftp at dimacs.rutgers.edu, in directory /pub/netflow/mincost/solver-1, but I don't know the copyright situation (I used to think you had to buy the book to get the code). People are sometimes surprised to learn that integer problems are solved using floating point arithmetic. Although various algorithms for MIP have been studied, most if not all available general purpose large-scale LP codes use a method called "Branch and Bound" to try to find an optimal solution. In a nutshell, B&B solves MIP by solving a sequence of related LP models. Good codes for MIP distinguish themselves more by solving shorter sequences of LP's, than by solving the individual LP's faster. Even moreso than with regular LP, a costly commercial code may prove its value to you if your MIP model is difficult. As a point of interest, the Simplex Method currently retains an advantage over the newer Interior Point methods for solving these sequences of LP's. The public domain code "lp_solve", mentioned above, accepts MIP models, as do a large proportion of the commercial LP codes in the OR/MS Today survey. I have seen mention made of algorithm 333 in the Collected Algorithms from CACM, though I'd be surprised if it was robust enough to solve large models. The better MIP codes have numerous parameters and options to give the user control over the solution strategy. Most have the capability of stopping before an optimum is proved, printing the best answer obtained so far. For many MIP models, stopping early is a practical necessity. Fortunately, a solution that has been proved by the algorithm to be within, say, 1% of optimality often turns out to be the true optimum, and the bulk of the computation time is spent proving the optimality. For many modeling situations, a near-optimal solution is acceptable anyway. Once one accepts that large MIP models are not typically solved to a proved optimal solution, that opens up a broad area of approximate methods, probabilistic methods and heuristics, as well as modifications to B&B. Claims have been made for Genetic Algorithms and Simulated Annealing, though (IMHO) these successes have been problem dependent and difficult to generalize. (A reference for GA is David Goldberg, "Genetic Algorithms in Machine Learning.") When trying to solve a difficult MIP model, it is usually crucial to understand the workings of whatever it is you are modeling, and try to find some insight that will assist your chosen algorithm to work better. A related observation is that the way you formulate your model can be as important as the actual choice of solver. You should consider getting some assistance if this is your first time trying to solve a large (>100 variable) problem. -------------------------------------------------------------------------- 4. "I've written my own optimization code. How can I test it?" A: In light of the comments above, I hope your aims are fairly modest, for there are already a lot of good codes out there. I hope your LP code makes use of sparse matrix techniques, rather than using a tableau form of the Simplex method, because the latter usually ends up being numerically unstable and very slow. If you want to try out your code on some real-world LP models, there is a very nice collection of small-to-medium-size ones on netlib. If you have ftp access, you can try "ftp research.att.com", using "netlib" as the Name, and your email address as the password. Do a "cd lp/data" and look around. There should be a "readme" file, which you would want to look at first. Alternatively, you can reach an e-mail server via "netlib@ornl.gov", to which you can send a message saying "send index from lp/data"; follow the instructions you receive. The Netlib LP files (after you uncompress them) are in a format called MPS, which is described in the section below. There is a collection of MIP models, housed at Rice University. Send an email message containing "send catalog" to softlib@rice.edu, to get started. There is a collection of network-flow codes and models at DIMACS (Rutgers University). Use anonymous FTP at dimacs.rutgers.edu. Start looking in /pub/netflow. Another network generator is called NETGEN and is available on netlib (lp/generators). John Beasley has posted information on his OR-Lib, which contains various optimization test problems. Send e-mail to umtsk99@vaxa.cc.imperial.ac.uk to get started. Or have a look in the Journal of the Operational Research Society, Volume 41, Number 11, Pages 1069-72. There are various test sets for NLP (Non-Linear Programming). Among those I've seen mentioned are - ACM TOMS (Transactions on Mathematical Software), V13 No3 P272 - Publications (listed in another section of this list) by Schittkowski; Hock & Schittkowski; Floudas & Pardalos; Torn; Hughes & Grawiog. Many of the other references also contain various problems that you could use to test a code. The modelling language GAMS comes with about 100 test models, which you might be able to test your code with. -------------------------------------------------------------------------- 5. "What is MPS format?" A: MPS format was named after an early IBM LP product and has emerged as a de facto standard ASCII medium among most of the various commercial LP codes. You will need to write your own reader routine for this, but it's not too hard. The main things to know about MPS format are that it is column oriented (as opposed to entering the model as equations), and everything (variables, rows, etc.) gets a name. MPS format is described in more detail in Murtagh's book, referenced in another section. Here is a little sample model, explained in more detail below: NAME METALS ROWS N VALUE E YIELD L FE L MN L CU L MG G AL L SI COLUMNS BIN1 VALUE 0.03 YIELD 1. BIN1 FE 0.15 CU .03 BIN1 MN 0.02 MG .02 BIN1 AL 0.7 SI .02 BIN2 VALUE 0.08 YIELD 1. BIN2 FE .04 CU .05 BIN2 MN .04 MG .03 BIN2 AL .75 SI .06 BIN3 VALUE 0.17 YIELD 1. BIN3 FE .02 CU .08 BIN3 MN .01 AL .8 BIN3 SI .08 BIN4 VALUE 0.12 YIELD 1. BIN4 FE .04 CU .02 BIN4 MN .02 AL .75 BIN4 SI 0.12 BIN5 VALUE 0.15 YIELD 1. BIN5 FE .02 CU .06 BIN5 MN .02 MG .01 BIN5 SI .02 AL .8 ALUM VALUE 0.21 YIELD 1. ALUM FE .01 CU .01 ALUM AL .97 SI .01 SILCON VALUE 0.38 YIELD 1. SILCON FE .03 SI .97 RHS ALOY1 YIELD 2000. FE 60. ALOY1 CU 100. MN 40. ALOY1 MG 30. AL 1500. ALOY1 SI 300. BOUNDS UP PROD1 BIN1 200.00 UP PROD1 BIN2 750.00 LO PROD1 BIN3 400.00 UP PROD1 BIN3 800.00 LO PROD1 BIN4 100.00 UP PROD1 BIN4 700.00 UP PROD1 BIN5 1500.00 ENDATA MPS is an old format, so it is set up as though you were using punch cards, and is not free format. Fields start in column 1, 5, 15, 25, 40 and 50. Sections of an MPS file are marked by so-called header cards, which are distinguished by their starting in column 1. Although it is typical to use upper-case throughout the file (like I said, MPS has long historical roots), many MPS-readers will accept mixed-case for anything except the header cards, and some allow mixed-case anywhere. The NAME card can be anything you want. The ROWS section defines the names of all the constraints; entries in column 2 or 3 are E for equality rows, L for less-than ( <= ) rows, G for greater-than ( >= ) rows, and N for non- constraining rows (the first of which would be interpreted as the objective function). The largest part of the file is in the COLUMNS section, which is the place where the entries of the A-matrix are put. All entries for a given column must be placed consecutively, although within a column the order of the entries (rows) is irrelevant. Rows not mentioned for a column are assumed to have a coefficient of zero. The RHS section allows one or more right-hand-side vectors to be defined; most people don't bother having more than one. In the above example, its name is ALOY1, and has non-zero values in all 7 of the constraint rows of the problem. Rows not mentioned are assumed to have a right-hand-side of zero. The optional BOUNDS section lets you put lower and upper bounds on individual variables (no * wildcards, unfortunately), instead of having to define extra rows in the matrix. All the bounds that have a given name in column 5 are taken together as a set. Variables not mentioned in a BOUNDS set are taken to be non-negative. There is another optional section called RANGES that I won't go into here. The final card must be the ENDATA, and yes, it is spelled funny. For comparison, here is the same model written out in equation format. Minimize VALUE: 0.03 BIN1 + 0.08 BIN2 + 0.17 BIN3 + 0.12 BIN4 + 0.15 BIN5 + 0.21 ALUM + 0.38 SILCON Subject To YIELD: BIN1 + BIN2 + BIN3 + BIN4 + BIN5 + ALUM + SILCON = 2000 FE: 0.15 BIN1 + 0.04 BIN2 + 0.02 BIN3 + 0.04 BIN4 + 0.02 BIN5 + 0.01 ALUM + 0.03 SILCON <= 60 MN: 0.02 BIN1 + 0.04 BIN2 + 0.01 BIN3 + 0.02 BIN4 + 0.02 BIN5 <= 40 CU: 0.03 BIN1 + 0.05 BIN2 + 0.08 BIN3 + 0.02 BIN4 + 0.06 BIN5 + 0.01 ALUM <= 100 MG: 0.02 BIN1 + 0.03 BIN2 + 0.01 BIN5 <= 30 AL: 0.7 BIN1 + 0.75 BIN2 + 0.8 BIN3 + 0.75 BIN4 + 0.8 BIN5 + 0.97 ALUM >= 1500 SI: 0.02 BIN1 + 0.06 BIN2 + 0.08 BIN3 + 0.12 BIN4 + 0.02 BIN5 + 0.01 ALUM + 0.97 SILCON <= 300 and 0 <= BIN1 <= 200 0 <= BIN2 <= 750 400 <= BIN3 <= 800 100 <= BIN4 <= 700 0 <= BIN5 <= 1500 -------------------------------------------------------------------------- 6. "What software is there for non-linear optimization?" A: I don't claim as much expertise in this area, but the question is frequent enough to be worth addressing. If I get enough feedback, this section might grow large enough to need to be split off as a separate FAQ list. It's unrealistic to expect to find one general NLP code that's going to work for every kind of nonlinear model. Instead, you should try to find a code that fits the problem you are solving. Nonlinear solution techniques can be divided into various categories, such as unconstrained, linearly constrained, convexly constrained, or general. If your problem doesn't fit in any category except "general", you should be prepared to have to use a method that boils down to exhaustive search, i.e., you have an intractable problem (see the comments in the MIP section on Simulated Annealing and Genetic Algorithms). Several of the commercial LP codes referenced above have specialized routines, particularly for Quadratic Programming (linear constraints with a quadratic objective function). Many nonlinear problems can be solved by application of a sequence of LP or QP approximations. There is an ACM TOMS routine for QP, #587, available from the netlib server, in directory /netlib/toms. See the section on test models for detail on how to use this server. Here is a summary of codes mentioned in newsgroups in the past year, not sorted into categories. NPSOL - Stanford University, Office of Technology Licensing, 415-723-0651. MINOS - Stanford University, Office of Technology Licensing, 415-723-0651. NAG Library routine E04UCF. IMSL routine UMINF and UMIDH. Harwell Library routine VF04. Hooke and Jeeves algorithm - reference? MINPAK I and II - Contact Steve Wright, wright@mcs.anl.gov GENOCOP - Zbigniew Michalewicz, zbyszek@mosaic.uncc.edu DFPMIN - Numerical Recipes (Davidon-Fletcher-Powell) Amoeba - Numerical Recipes Brent's Method - Numerical Recipes FSQP - Contact Andre Tits, andre@src.umd.edu CONMIN - Vanderplaats and Associates, Goleta CA NOVA - DOT Products, Houston TX GRG2 - Leon Lasdon, University of Texas, Austin TX -------------------------------------------------------------------------- 7. "What references are there on optimization?" A: Too many to count. Here are a few that I like or have been recommended on the net (I have *not* reviewed them all): Bazaraa & Shetty, Nonlinear Programming, Theory & Applications. Coleman & Li, Large Scale Numerical Optimization, SIAM Books. Dennis & Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall. Fiacco & McCormick, Sequential Unconstrained Minimization Techniques, SIAM Books. Fletcher, R., Practical Methods of Optimization, Wiley 1987. Floudas & Pardalos, A collection of test problems for constrained optimization algorithms, Springer-Verlag, 1990. Forsythe, Malcolm & Moler, Computer Methods for Mathematical Computations, Prentice-Hall. Gill, Murray & Wright, Practical Optimization, Academic Press. Hock & Schittkowski, Test Examples for Nonlinear Programming Codes, Springer-Verlag, 1981. Hughes & Grawiog, Linear Programming: An Emphasis on Decision Making, Addison-Wesley, 1973. Kahaner, Moler & Nash, Numerical Methods and Software, Prentice-Hall. Kennington & Helgason, Algorithms for Network Programming, Wiley, 1980. Kirkpatrick, Gelatt & Vecchi, Optimization by Simulated Annealing, Science, 220 (4598) 671-680 (1983). Luenberger, Linear and Nonlinear Programming, Addison Wesley. More, "Numerical Solution of Bound Constrained Problems", in Computational Techniques & Applications, CTAC-87, Noye & Fletcher (eds), North-Holland, 29-37 (1988). More & Toraldo, Algorithms for Bound Constrained Quadratic Programming Problems, Numerische Mathematik 55, 377-400, 1989. Murtagh, B., Advanced Linear Programming, McGraw-Hill, 1981. Nemhauser, Rinnooy Kan, & Todd (eds), Optimization, North-Holland, 1989. Press, Flannery, Teukolsky & Vetterling , Numerical Recipes, Cambridge, 1986. (Comment: use with care.) Schittkowski, Nonlinear Programming Codes, Springer-Verlag, 1980. Torn & Zilinskas, Global Optimization, Springer-Verlag, 1989. Williams, H.P., Model Building in Mathematical Programming, Wiley 1985. -------------------------------------------------------------------------- 8. "Who maintains this list?" A: John W. Gregory LP Specialist (it says that on my business card, it must be true!) Applications Department Cray Research, Inc. Eagan, MN 55121 USA jwg@cray.com 612-683-3673 I suppose I should say something here to the effect that "the material in this document does not reflect any official position taken by Cray Research, Inc." Also, "use at your own risk", "no endorsement of products mentioned", etc., etc. I probably should have scattered more "IMHO"s around in the text, but that acronym seems weaselly and once you start it's hard to know where to stop. I should have put in a few more smilies here and there too, to assist the humor impaired - be on your toes. 8v) I've tried to keep my own biases (primarily, toward the high end of computing) from dominating what I write here, and other viewpoints that I've missed are welcome. Suggestions, corrections, topics you'd like to see covered, and additional material (particularly on NLP) are solicited. Comments to the effect "who died and made *you* optimal?" will likely result in you getting stuck with maintaining the FAQL instead of me. 8v) I regret that when I started this list I didn't keep careful track of all the contributors whose knowledge I tapped. In several instances, the information herein comes from postings on the net, which I assumed to be fair use and in the public domain. It being too late now to make individual acknowledgements, I offer a blanket THANKS to all who have posted on these topics to the net. This FAQL is also being posted to news.answers, which is archived in the periodic posting archive on pit-manager.mit.edu [18.172.1.27], in the anonymous ftp directory /pub/usenet/news.answers. Copies of this FAQL may be made freely, as long as it is distributed at no charge and with this disclaimer included. --------------------------------------------------------------------------